Hive
Hive Function: Built-in & UDF (User Defined Functions)
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic,...
In this tutorial, you will learn-
Join queries can perform on two tables present in Hive. For understanding Join Concepts in clear here we are creating two tables overhere,
Step 1) Creation of table "sample_joins" with Column names ID, Name, Age, address and salary of the employees
Step 2) Loading and Displaying Data
From the above screen shot
Step 3) Creation of sample_joins1 table and loading, displaying data
From the above screenshot, we can observe the following
Moving forward we will see different types of joins that can be performed on tables we have created but before that you have to consider following points for joins.
Some points to observe in Joins:
Joins are of 4 types, these are
Inner Join:
The Records common to the both tables will be retrieved by this Inner Join.
From the above screenshot, we can observe the following
Query:
SELECT c.Id, c.Name, c.Age, o.Amount FROM sample_joins c JOIN sample_joins1 o ON(c.Id=o.Id);
Left Outer Join:
From the above screenshot, we can observe the following
For example here we are using employee id as a reference, it checks whether id is common in right as well as left the table or not. It acts as matching condition.
NULL values in the above output are columns with no values from Right table that is sample_joins1
Query:
SELECT c.Id, c.Name, o.Amount, o.Date1 FROM sample_joins c LEFT OUTER JOIN sample_joins1 o ON(c.Id=o.Id)
Right outer Join:
From the above screenshot, we can observe the following
Query:
SELECT c.Id, c.Name, o.Amount, o.Date1 FROM sample_joins c RIGHT OUTER JOIN sample_joins1 o ON(c.Id=o.Id)
Full outer join:
It combines records of both the tables sample_joins and sample_joins1 based on the JOIN Condition given in query.
It returns all the records from both tables and fills in NULL Values for the columns missing values matched on either side.
From the above screen shot we can observe the following:
Query
SELECT c.Id, c.Name, o.Amount, o.Date1 FROM sample_joins c FULL OUTER JOIN sample_joins1 o ON(c.Id=o.Id)
A Query present within a Query is known as a sub query. The main query will depend on the values returned by the subqueries.
Subqueries can be classified into two types
When to use:
Syntax:
Subquery in FROM clause SELECT <column names 1, 2…n>From (SubQuery) <TableName_Main > Subquery in WHERE clause SELECT <column names 1, 2…n> From<TableName_Main>WHERE col1 IN (SubQuery);
Example:
SELECT col1 FROM (SELECT a+b AS col1 FROM t1) t2
Here t1 and t2 are table names. The colored one is Subquery performed on table t1. Here a and b are columns that are added in a subquery and assigned to col1. Col1 is the column value present in Main table. This column "col1" present in the subquery is equivalent to the main table query in column col1.
Hive provides feasibility of writing user specific scripts for the client requirements. The users can able to write their own map and reduce scripts for the requirements. These are called Embedded Custom scripts. The coding logic is defined in the custom scripts and we can use that script in the ETL time.
When to choose Embedded Scripts:
For this in Hive it uses TRANSFORM clause to embedded both map and reducer scripts.
In this Embedded custom scripts, we have to observe the following points
Sample Embedded Script,
FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' AS dt, uid CLUSTER BY dt) map_output INSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.dt, map_output.uid USING 'reduce_script' AS date, count;
From the above script, we can observe the following
This is only the sample script for understanding
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic,...
In this tutorial, you will learn- What is Hive? Hive Architecture Different modes of Hive What is...
Hive provides SQL type querying language for the ETL purpose on top of Hadoop file system. Hive Query...
Training Summary Apache Hive helps with querying and managing large datasets real fast. It is an...
Why to Use MySQL in Hive as Metastore: By Default, Hive comes with derby database as metastore. Derby...
What is HiveQL(Hive Query Language)? Hive provides a CLI to write Hive queries using Hive Query...