Prior to Hive installation we require dedicated Hadoop installation, up and running all the Hadoop daemons.
For Installation of Hadoop check this link
Once all the Hadoop daemons are working fine, just start Hive installation part.
Hive is developed on top of Hadoop as its data warehouse framework for querying and analysis of data that is stored in HDFS. Hive is an open source-software that lets programmers analyze large data sets on Hadoop.
In this tutorial, you will learn-
Step 1) Downloading and Installing Hive
For downloading Hive stable version refer apache URL as mentioned below
Go to the URL location and Click on the link to download hive tar file.
Step 2) Extracting the tar file.
Go to the downloaded Tar file location ->extract the tar file by using the following command
tar –xvf apache-hive-1.2.0-bin.tar.gz
Step 3) Different Configuration properties to be placed in Hive.
In this step, we are going to do two things
- Placing Hive Home path in bashrc file
- Placing Hadoop Home path location in hive-config.sh
- Mention Hive Pathin ~/.bashrc
- Open the bashrc file as shown in above screenshot
- Mention Hive home path i.e., HIVE_HOME path in bashrc file and export it as shown in below
Code to be placed in bashrc
- Exporting Hadoop path in Hive-config.sh ( To communicate with the Hadoop eco system we are defining Hadoop Home path in hive config field)
Open the hive-config.sh as shown in below
Mention the HADOOP_HOME Path in hive-config.sh file as shown in below ( HADOOP_HOME Path)
Step 4) Creating Hive directories in Hadoop:
To communicate with Hadoop, we need to create directories in Hadoop as shown below.
Giving root permissions to create Hive folders in Hadoop.If it doesn't throw any error message, then it means that Hadoop has successfully given permissions to Hive folders.
Step 5) Getting into Hive shell by entering '. /hive' command as shown in below.
Here we are going to create sample table using Hive shell command "create" with column names.
Sample Code for creating data base in Hive
From the above screen shot we can observe the following:
- Creation of Sample Table with column names in Hive
- Here the table name is "product" with three column names product, pname, and price
- The three column names denoted by their respective data type
- All fields are terminated by coma ', '
- Displaying Hive Table information
- Using "describe" command we can able to see the table information present in Hive
- Here it is displaying column names with their respective data types present in table schema
- At the end, it will display time to perform this command and number of rows it fetched
Sample Code for creating data base in Hive (For self check )
1) Create table product(product int, pname string, price float)
Row format delimited
Fields terminated by ',';
Why to Use MySQL in Hive as Meta store:
- By Default, Hive comes with derby database as metastore.
- Derby database can support only single active user at a time
- Derby is not recommended in production environment
So the solution here is
- Use MYSQL as Meta storage at backend to connect multiple users with Hive at a time
- MYSQL is Best choice for the standalone metastore
Steps to Install and configure MySQL database in Hive on Hadoop
Step 1) In this step, we are going to perform two tasks
- Installation of mysql-server
- Checking the mysql-server and its process
- Using sudo apt-get install mysql-server command, we can download mysql server
Install MySQL as shown in the screenshot
- After successful installation at end, MySQL will run as shown in below screen shot
Step 2) Installing MySQL Java Connector. This is for java dependencies and connection purpose
Step 3) Creating soft link for connector in Hive lib directory. This is for soft link between Java and MySql.
Step 4) Configuring MySql storage in Hive
- Type MySql –u root –p followed by password
- Here –u represents root username , p denotes password
- After entering the above command, the user has to enter valid password and then click enter
- Then it will enter into MySql shell mode
Step 5) Creating username and password for MySql, granting privileges.
We have to execute the commands as shown below,
mysql> CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';
mysql> GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword';
mysql> flush privileges;
Step 6) Configuring hive-site.xml
- After Step 5 assign username and password to MySQL database and given privileges.
- Here we will configure some properties in Hive to get a connection with MySQL database.
From the above screenshot, we observe the following. Here we are defining 4 properties that could be necessary to establish MYSQL as Meta store in Hive
These are as follows:
- This property is for connection URL purpose. Here we are defining ConnectionURL in this property. It acts as JDBC connection and its representing metastore location as well
- This property is for Connection driver name. Here mysql.jdbc.Driver is the respected value we have to mention in the value tag
- This property used for defining Connection User name. In this, we defined "hiveguru" as user name
- This property used for mentioning Connection Password. In this, we defined password as user password.
Once the properties placed in hive –site.xml we have to manually save (Ctrl+S) and close the file. After closing this file, we have to create Hive table and check the table details in MySQL storage.
Place this code in hive-site.xml
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>user name for connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivepassword</value> <description>password for connecting to mysql server</description> </property> </configuration>
Step7) Create table "guru99" in Hive.
From the above screenshot, we can observe the following
- Creation of table name"guru99" with two column names
- The columns names mentioned with its data type as one is integer, and another one is of string type
In the next step, we are going to check whether it is stored in MySql or not
Step 8) Entering into MySql shell mode
From the above screenshot, we can observe the following
- First we have to use the database as "use metastore"
- Once it chooses meta store we can check the tables present in this by using "show" tables command as shown in the screenshot
- Whatever the tables that are created in Hive, the metadata corresponds to that tables are stored under TBLS in MySQL database.
- "Guur99 table" is created in Hive, so the corresponding metadata is stored in MySQL under TBLS.
Step 9) Checking whether created table is presenting MySQL or Not
By entering select * from TBLS, it is going to display the tables that we created in Hive shell mode
From the above screenshot we can observe following things:
- Table name "guru99" that created is the Hive can be displayed in MySQL shell mode
- Beside this, it will also provide information like table creation time, accessed time and other properties as shown in screen shot above.