Hive
Hive Function: Built-in & UDF (User Defined Functions)
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic,...
Hive as an ETL and data warehousing tool on top of Hadoop ecosystem provides functionalities like Data modeling, Data manipulation, Data processing and Data querying. Data Extraction in Hive means the creation of tables in Hive and loading structured and semi structured data as well as querying data based on the requirements.
For batch processing, we are going to write custom defined scripts using a custom map and reduce scripts using a scripting language. It provides SQL like environment and support for easy querying.
In this tutorial, you will learn-
Structured Data means that data is in the proper format of rows and columns. This is more of like RDBMS data with proper rows and columns.
Here we are going to load structured data present in text files in Hive
Step 1) In this step we are creating table "employees_guru" with column names such as Id, Name, Age, Address, Salary and Department of the employees with data types.
From the above screenshot, we can observe the following,
Step 2) In this step we are displaying the contents stored in this table by using "Select" command. We can observe the table contents in the following screen shot.
- Sample code Snippet
Queries to be performed
1) Create table employees_guru(Id INT, Name STRING, Age INT, Address STRING, Salary FLOAT, Department STRING) > Row format delimited > Fields terminated by ','; 2) load data local inpath '/home/hduser/Employees.txt' into TABLE employees_guru; 3) select * from employees_guru;
Hive performs ETL functionalities in Hadoop ecosystem by acting as ETL tool. It can be difficult to perform map reduce in some type of applications, Hive can reduce the complexity and provides the best solution to the IT applications in terms of data warehousing sector.
Semi structured data such as XML and JSON can be processed with less complexity using Hive. First we will see how we can use Hive for XML.
XML TO HIVE TABLE
In this, we are going to load XML data into Hive tables, and we will fetch the values stored inside the XML tags.
Step 1) Creation of Table "xmlsample_guru" with str column with string data type.
From the above screenshot, we can observe the following
Step 2) Using XPath () method we will be able to fetch the data stored inside XML tags.
From the above screenshot, we can observe the following
Step 3) In this step, we will fetch and display the Raw XML of table "xmlsample_guru."
From the above screenshot, we can observe the following
Code Snippet:
Queries to be performed
1) create table xmlsample_guru(str string); 2) load data local inpath '/home/hduser/test.xml' overwrite into table xmlsample_guru; 3) select xpath(str,'emp/ename/text()'), xpath(str,'emp/esal/text()') from xmlsample_guru;
JSON (JavaScript Object Notation)
Twitter and websites data is stored in JSON format. Whenever we try to fetch data from online servers it will return JSON files. Using Hive as data store we can able to load JSON data into Hive tables by creating schemas.
JSON TO HIVE TABLE
In this, we are going to load JSON data into Hive tables, and we will fetch the values stored in JSON schema.
Step 1) In this step, we are going to create JSON table name "json_guru". Once created loading and displaying contents of the actual schema.
From the above screenshot, we can observe the following
Step 2) Using get_json_object() Method we can able to fetch the Data values stored in JSON hierarchy
From the above screen shot we can observe the following
Code Snippet
Queries to be performed
1) create table json_guru(str string); 2) load data inpath 'home/hduser/test.json' into table json_guru; 3) select * from json1; 4) select get_json_object(str,'$.ecode') as ecode, get_json_object(str,'$.ename') as ename ,get_json_object(str,'$.sal') as salary from json_guru;
Complex JSON TO HIVE TABLE
In this, we are going to load Complex JSON data into Hive tables, and we will fetch the values stored in JSON schema
Step 1) Creating complexjson_guru with single column field
From the above screenshot, we can observe the following
Step 2) By using get_json_object we can retrieve the actual content that stored inside of JSON file hierarchy.
From the following screenshot, we can able to see the output of the data stored in complexjson_guru.
Step 3) In this step, by using "Select" command we actually able to see complex JSON data stored inside table "complexjson_guru"
-Sample Code Snippet,
Queries to be performed
1) create table complexjson_guru(json string); 2) load data inpath 'home/hduser/emp.json' into table complexjson_guru; 3) select get_json_object(json,'$.ecode') as ecode ,get_json_object(json,'$.b') as code, get_json_object(json,'$.c') from complexjson_guru; 4) select * from complexjson_guru;
When and Where to Use Hive on Hadoop Ecosystem:
When
Where
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic,...
Prior to Apache Hive installation we require dedicated Hadoop installation, up and running with...
Tables, Partitions, and Buckets are the parts of Hive data modeling. What is Partitions? Hive...
Why to Use MySQL in Hive as Metastore: By Default, Hive comes with derby database as metastore. Derby...
In this tutorial, you will learn- Join queries Different type of joins Sub queries Embedding custom...
What is a View? Views are similar to tables, which are generated based on the requirements. We can...