Hive Functions: Built-in & UDF [User Defined Functions] Example

Functions are built for a specific purpose to perform operations like Mathematical, arithmetic, logical, and relational on the operands of table column names.

Built-in functions

These are functions that are already available in Hive. First, we have to check the application requirement, and then we can use these built-in functions in our applications. We can call these functions directly in our application.

The syntax and types are mentioned in the following section.

Types of Built-in Functions in HIVE

Collection Functions
Date Functions
Mathematical Functions
Conditional Functions
String Functions
Misc. Functions

Collection Functions

These functions are used for collections. Collections mean the grouping of elements and returning single or array of elements depends on return type mentioned in function name.

Return Type	Function Name	Description
INT	size(Map<K.V>)	It will fetch and give the components number in the map type
INT	size(Array<T>)	It will fetch and give the elements number in the array type
Array<K>	Map_keys(Map<K.V>)	It will fetch and gives an array containing the keys of the input map. Here array is in unordered
Array<V>	Map_values(Map<K.V>)	It will fetch and gives an array containing the values of the input map. Here array is in unordered
Array<t>	Sort_array(Array<T>)	sorts the input array in ascending order of array and elements and returns it

Date Functions

These are used to perform Date Manipulations and Conversion of Date types from one type to another type:

Function Name	Return Type	Description
Unix_Timestamp()	BigInt	We will get current Unix timestamp in seconds
To_date(string timestamp)	string	It will fetch and give the date part of a timestamp string:
year(string date)	INT	It will fetch and give the year part of a date or a timestamp string
quarter(date/timestamp/string)	INT	It will fetch and give the quarter of the year for a date, timestamp, or string in the range 1 to 4
month(string date)	INT	It will give the month part of a date or a timestamp string
hour(string date)	INT	It will fetch and gives the hour of the timestamp
minute(string date)	INT	It will fetch and gives the minute of the timestamp
Date_sub(string starting date, int days)	string	It will fetch and gives Subtraction of number of days to starting date
Current_date	date	It will fetch and gives the current date at the start of query evaluation
LAST _day(string date)	string	It will fetch and gives the last day of the month which the date belongs to
trunc(string date, string format)	string	It will fetch and gives date truncated to the unit specified by the format. Supported formats in this : MONTH/MON/MM, YEAR/YYYY/YY.

Mathematical Functions

These functions are used for Mathematical Operations. Instead of creatingUDFs , we have some inbuilt mathematical functions in Hive.

Function Name	Return Type	Description
round(DOUBLE X)	DOUBLE	It will fetch and returns the rounded BIGINT value of X
round(DOUBLE X, INT d)	DOUBLE	It will fetch and returns X rounded to d decimal places
bround(DOUBLE X)	DOUBLE	It will fetch and returns the rounded BIGINT value of X using HALF_EVEN rounding mode
floor(DOUBLE X)	BIGINT	It will fetch and returns the maximum BIGINT value that is equal to or less than X value
ceil(DOUBLE a), ceiling(DOUBLE a)	BIGINT	It will fetch and returns the minimum BIGINT value that is equal to or greater than X value
rand(), rand(INT seed)	DOUBLE	It will fetch and returns a random number that is distributed uniformly from 0 to 1

Conditional Functions

These functions used for conditional values checks.

Function Name	Return Type	Description
if(Boolean testCondition, T valueTrue, T valueFalseOrNull)	T	It will fetch and gives value True when Test Condition is of true, gives value False Or Null otherwise.
ISNULL( X)	Boolean	It will fetch and gives true if X is NULL and false otherwise.
ISNOTNULL(X )	Boolean	It will fetch and gives true if X is not NULL and false otherwise.

String Functions

String manipulations and string operations these functions can be called.

Function Name	Return Type	Description
reverse(string X)	string	It will give the reversed string of X
rpad(string str, int length, string pad)	string	It will fetch and gives str, which is right-padded with pad to a length of length(integer value)
rtrim(string X)	string	It will fetch and returns the string resulting from trimming spaces from the end (right hand side) of X For example, rtrim(‘ results ‘) results in ‘ results’
space(INT n)	string	It will fetch and gives a string of n spaces.
split(STRING str, STRING pat)	array	Splits str around pat (pat is a regular expression).
Str_to_map(text[, delimiter1, delimiter2])	map<String ,String>	It will split text into key-value pairs using two delimiters.

UDFs (User Defined Functions)

In Hive, the users can define own functions to meet certain client requirements. These are known as UDFs in Hive. User Defined Functions written in Java for specific modules.

Some of UDFs are specifically designed for the reusability of code in application frameworks. The developer will develop these functions in Java and integrate those UDFs with the Hive.

During the Query execution, the developer can directly use the code, and UDFs will return outputs according to the user-defined tasks. It will provide high performance in terms of coding and execution.

For example, for string stemming we don’t have any predefined function in Hive. For this, we can write stem UDF in Java. Wherever we require Stem functionality, we can directly call this Stem UDF in Hive.

Here stem functionality means deriving words from its root words. It is like stemming algorithm reduces the words “wishing”, “wished”, and “wishes” to the root word “wish.” For performing this type of functionality, we can write UDF in Java and integrate it with Hive.

Depending on the use cases, the UDFs can be written. It will accept and produce different numbers of input and output values.

The general type of UDF will accept a single input value and produce a single output value. If the UDF is used in the query, then UDF will be called once for each row in the result data set.

In the other way, it can accept a group of values as input and return a single output value as well.