Hive
How to Download & Install HIVE on Ubuntu
Prior to Apache Hive installation we require dedicated Hadoop installation, up and running with...
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic, logical and relational on the operands of table column names.
These are functions that already available in Hive. First, we have to check the application requirement, and then we can use this built in functions in our applications. We can call these functions directly in our application.
The syntax and types are mentioned in the following section.
Types of Built-in Functions in HIVE
Collection Functions:
These functions are used for collections. Collections mean the grouping of elements and returning single or array of elements depends on return type mentioned in function name.
Return Type | Function Name | Description |
---|---|---|
INT | size(Map<K.V>) | It will fetch and give the components number in the map type |
INT | size(Array<T>) | It will fetch and give the elements number in the array type |
Array<K> | Map_keys(Map<K.V>) | It will fetch and gives an array containing the keys of the input map. Here array is in unordered |
Array<V> | Map_values(Map<K.V>) | It will fetch and gives an array containing the values of the input map. Here array is in unordered |
Array<t> | Sort_array(Array<T>) | sorts the input array in ascending order of array and elements and returns it |
Date Functions:
These are used to perform Date Manipulations and Conversion of Date types from one type to another type:
Function Name | Return Type | Description |
---|---|---|
Unix_Timestamp() | BigInt | We will get current Unix timestamp in seconds |
To_date(string timestamp) | string | It will fetch and give the date part of a timestamp string: |
year(string date) | INT | It will fetch and give the year part of a date or a timestamp string |
quarter(date/timestamp/string) | INT | It will fetch and give the quarter of the year for a date, timestamp, or string in the range 1 to 4 |
month(string date) | INT | It will give the month part of a date or a timestamp string |
hour(string date) | INT | It will fetch and gives the hour of the timestamp |
minute(string date) | INT | It will fetch and gives the minute of the timestamp |
Date_sub(string starting date, int days) | string | It will fetch and gives Subtraction of number of days to starting date |
Current_date | date | It will fetch and gives the current date at the start of query evaluation |
LAST _day(string date) | string | It will fetch and gives the last day of the month which the date belongs to |
trunc(string date, string format) | string | It will fetch and gives date truncated to the unit specified by the format. Supported formats in this : MONTH/MON/MM, YEAR/YYYY/YY. |
Mathematical Functions:
These functions are used for Mathematical Operations. Instead of creatingUDFs , we have some inbuilt mathematical functions in Hive.
Function Name | Return Type | Description |
---|---|---|
round(DOUBLE X) | DOUBLE | It will fetch and returns the rounded BIGINT value of X |
round(DOUBLE X, INT d) | DOUBLE | It will fetch and returns X rounded to d decimal places |
bround(DOUBLE X) | DOUBLE | It will fetch and returns the rounded BIGINT value of X using HALF_EVEN rounding mode |
floor(DOUBLE X) | BIGINT | It will fetch and returns the maximum BIGINT value that is equal to or less than X value |
ceil(DOUBLE a), ceiling(DOUBLE a) | BIGINT | It will fetch and returns the minimum BIGINT value that is equal to or greater than X value |
rand(), rand(INT seed) | DOUBLE | It will fetch and returns a random number that is distributed uniformly from 0 to 1 |
Conditional Functions:
These functions used for conditional values checks.
Function Name | Return Type | Description |
---|---|---|
if(Boolean testCondition, T valueTrue, T valueFalseOrNull) | T | It will fetch and gives value True when Test Condition is of true, gives value False Or Null otherwise. |
ISNULL( X) | Boolean | It will fetch and gives true if X is NULL and false otherwise. |
ISNOTNULL(X ) | Boolean | It will fetch and gives true if X is not NULL and false otherwise. |
String Functions:
String manipulations and string operations these functions can be called.
Function Name | Return Type | Description |
---|---|---|
reverse(string X) | string | It will give the reversed string of X |
rpad(string str, int length, string pad) | string | It will fetch and gives str, which is right-padded with pad to a length of length(integer value) |
rtrim(string X) | string | It will fetch and returns the string resulting from trimming spaces from the end (right hand side) of X For example, rtrim(' results ') results in ' results' |
space(INT n) | string | It will fetch and gives a string of n spaces. |
split(STRING str, STRING pat) | array | Splits str around pat (pat is a regular expression). |
Str_to_map(text[, delimiter1, delimiter2]) | map<String ,String> | It will split text into key-value pairs using two delimiters. |
In Hive, the users can define own functions to meet certain client requirements. These are known as UDFs in Hive. User Defined Functions written in Java for specific modules.
Some of UDFs are specifically designed for the reusability of code in application frameworks. The developer will develop these functions in Java and integrate those UDFs with the Hive.
During the Query execution, the developer can directly use the code, and UDFs will return outputs according to the user defined tasks. It will provide high performance in terms of coding and execution.
For example, for string stemming we don't have any predefined function in Hive, for this we can write stem UDF in Java. Wherever we require Stem functionality, we can directly call this Stem UDF in Hive.
Here stem functionality means deriving words from its root words. It is like stemming algorithm reduces the words "wishing", "wished", and "wishes" to the root word "wish." For performing this type functionality, we can write UDF in java and integrate with Hive.
Depending on the use cases the UDFs can be written, it will accept and produce different numbers of input and output values.
The general type of UDF will accept single input value and produce a single output value. If the UDF used in the query, then UDF will be called once for each row in the result data set.
In the other way, it can accept a group of values as input and return single output value as well.
Prior to Apache Hive installation we require dedicated Hadoop installation, up and running with...
Hive as an ETL and data warehousing tool on top of Hadoop ecosystem provides functionalities like...
Training Summary Apache Hive helps with querying and managing large datasets real fast. It is an...
Tables, Partitions, and Buckets are the parts of Hive data modeling. What is Partitions? Hive...
What is HiveQL(Hive Query Language)? Hive provides a CLI to write Hive queries using Hive Query...
In this tutorial, you will learn- What is Hive? Hive Architecture Different modes of Hive What is...