Hive Functions: Built-in & UDF [User Defined Functions] Example
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic, logical, and relational on the operands of table column names.
Built-in functions
These are functions that are already available in Hive. First, we have to check the application requirement, and then we can use these built-in functions in our applications. We can call these functions directly in our application.
The syntax and types are mentioned in the following section.
Types of Built-in Functions in HIVE
- Collection Functions
- Date Functions
- Mathematical Functions
- Conditional Functions
- String Functions
- Misc. Functions
Collection Functions
These functions are used for collections. Collections mean the grouping of elements and returning single or array of elements depends on return type mentioned in function name.
Return Type | Function Name | Description |
---|---|---|
INT | size(Map<K.V>) | It will fetch and give the components number in the map type |
INT | size(Array<T>) | It will fetch and give the elements number in the array type |
Array<K> | Map_keys(Map<K.V>) | It will fetch and gives an array containing the keys of the input map. Here array is in unordered |
Array<V> | Map_values(Map<K.V>) | It will fetch and gives an array containing the values of the input map. Here array is in unordered |
Array<t> | Sort_array(Array<T>) | sorts the input array in ascending order of array and elements and returns it |
Date Functions
These are used to perform Date Manipulations and Conversion of Date types from one type to another type:
Function Name | Return Type | Description |
---|---|---|
Unix_Timestamp() | BigInt | We will get current Unix timestamp in seconds |
To_date(string timestamp) | string | It will fetch and give the date part of a timestamp string: |
year(string date) | INT | It will fetch and give the year part of a date or a timestamp string |
quarter(date/timestamp/string) | INT | It will fetch and give the quarter of the year for a date, timestamp, or string in the range 1 to 4 |
month(string date) | INT | It will give the month part of a date or a timestamp string |
hour(string date) | INT | It will fetch and gives the hour of the timestamp |
minute(string date) | INT | It will fetch and gives the minute of the timestamp |
Date_sub(string starting date, int days) | string | It will fetch and gives Subtraction of number of days to starting date |
Current_date | date | It will fetch and gives the current date at the start of query evaluation |
LAST _day(string date) | string | It will fetch and gives the last day of the month which the date belongs to |
trunc(string date, string format) | string | It will fetch and gives date truncated to the unit specified by the format.
Supported formats in this : MONTH/MON/MM, YEAR/YYYY/YY. |
Mathematical Functions
These functions are used for Mathematical Operations. Instead of creatingUDFs , we have some inbuilt mathematical functions in Hive.
Function Name | Return Type | Description |
---|---|---|
round(DOUBLE X) | DOUBLE | It will fetch and returns the rounded BIGINT value of X |
round(DOUBLE X, INT d) | DOUBLE | It will fetch and returns X rounded to d decimal places |
bround(DOUBLE X) | DOUBLE | It will fetch and returns the rounded BIGINT value of X using HALF_EVEN rounding mode |
floor(DOUBLE X) | BIGINT | It will fetch and returns the maximum BIGINT value that is equal to or less than X value |
ceil(DOUBLE a), ceiling(DOUBLE a) | BIGINT | It will fetch and returns the minimum BIGINT value that is equal to or greater than X value |
rand(), rand(INT seed) | DOUBLE | It will fetch and returns a random number that is distributed uniformly from 0 to 1 |
Conditional Functions
These functions used for conditional values checks.
Function Name | Return Type | Description |
---|---|---|
if(Boolean testCondition, T valueTrue, T valueFalseOrNull) | T | It will fetch and gives value True when Test Condition is of true, gives value False Or Null otherwise. |
ISNULL( X) | Boolean | It will fetch and gives true if X is NULL and false otherwise. |
ISNOTNULL(X ) | Boolean | It will fetch and gives true if X is not NULL and false otherwise. |
String Functions
String manipulations and string operations these functions can be called.
Function Name | Return Type | Description |
---|---|---|
reverse(string X) | string | It will give the reversed string of X |
rpad(string str, int length, string pad) | string | It will fetch and gives str, which is right-padded with pad to a length of length(integer value) |
rtrim(string X) | string | It will fetch and returns the string resulting from trimming spaces from the end (right hand side) of X For example, rtrim(‘ results ‘) results in ‘ results’ |
space(INT n) | string | It will fetch and gives a string of n spaces. |
split(STRING str, STRING pat) | array | Splits str around pat (pat is a regular expression). |
Str_to_map(text[, delimiter1, delimiter2]) | map<String ,String> | It will split text into key-value pairs using two delimiters. |
UDFs (User Defined Functions)
In Hive, the users can define own functions to meet certain client requirements. These are known as UDFs in Hive. User Defined Functions written in Java for specific modules.
Some of UDFs are specifically designed for the reusability of code in application frameworks. The developer will develop these functions in Java and integrate those UDFs with the Hive.
During the Query execution, the developer can directly use the code, and UDFs will return outputs according to the user-defined tasks. It will provide high performance in terms of coding and execution.
For example, for string stemming we don’t have any predefined function in Hive. For this, we can write stem UDF in Java. Wherever we require Stem functionality, we can directly call this Stem UDF in Hive.
Here stem functionality means deriving words from its root words. It is like stemming algorithm reduces the words “wishing”, “wished”, and “wishes” to the root word “wish.” For performing this type of functionality, we can write UDF in Java and integrate it with Hive.
Depending on the use cases, the UDFs can be written. It will accept and produce different numbers of input and output values.
The general type of UDF will accept a single input value and produce a single output value. If the UDF is used in the query, then UDF will be called once for each row in the result data set.
In the other way, it can accept a group of values as input and return a single output value as well.