Hive Functions: Built-in & UDF [User Defined Functions] Example


Functions are built for a specific purpose to perform operations like Mathematical, arithmetic, logical, and relational on the operands of table column names.

Built-in functions

These are functions that are already available in Hive. First, we have to check the application requirement, and then we can use these built-in functions in our applications. We can call these functions directly in our application.

The syntax and types are mentioned in the following section.

Types of Built-in Functions in HIVE

  • Collection Functions
  • Date Functions
  • Mathematical Functions
  • Conditional Functions
  • String Functions
  • Misc. Functions

Collection Functions

These functions are used for collections. Collections mean the grouping of elements and returning single or array of elements depends on return type mentioned in function name.

Return Type Function Name Description
INT size(Map<K.V>) It will fetch and give the components number in the map type
INT size(Array<T>) It will fetch and give the elements number in the array type
Array<K> Map_keys(Map<K.V>) It will fetch and gives an array containing the keys of the input map. Here array is in unordered
Array<V> Map_values(Map<K.V>) It will fetch and gives an array containing the values of the input map. Here array is in unordered
Array<t> Sort_array(Array<T>) sorts the input array in ascending order of array and elements and returns it

Date Functions

These are used to perform Date Manipulations and Conversion of Date types from one type to another type:

Function Name Return Type Description
Unix_Timestamp() BigInt We will get current Unix timestamp in seconds
To_date(string timestamp) string It will fetch and give the date part of a timestamp string:
year(string date) INT It will fetch and give the year part of a date or a timestamp string
quarter(date/timestamp/string) INT It will fetch and give the quarter of the year for a date, timestamp, or string in the range 1 to 4
month(string date) INT It will give the month part of a date or a timestamp string
hour(string date) INT It will fetch and gives the hour of the timestamp
minute(string date) INT It will fetch and gives the minute of the timestamp
Date_sub(string starting date, int days) string It will fetch and gives Subtraction of number of days to starting date
Current_date date It will fetch and gives the current date at the start of query evaluation
LAST _day(string date) string It will fetch and gives the last day of the month which the date belongs to
trunc(string date, string format) string It will fetch and gives date truncated to the unit specified by the format.

Supported formats in this :

MONTH/MON/MM, YEAR/YYYY/YY.

Mathematical Functions

These functions are used for Mathematical Operations. Instead of creatingUDFs , we have some inbuilt mathematical functions in Hive.

Function Name Return Type Description
round(DOUBLE X) DOUBLE It will fetch and returns the rounded BIGINT value of X
round(DOUBLE X, INT d) DOUBLE It will fetch and returns X rounded to d decimal places
bround(DOUBLE X) DOUBLE It will fetch and returns the rounded BIGINT value of X using HALF_EVEN rounding mode
floor(DOUBLE X) BIGINT It will fetch and returns the maximum BIGINT value that is equal to or less than X value
ceil(DOUBLE a), ceiling(DOUBLE a) BIGINT It will fetch and returns the minimum BIGINT value that is equal to or greater than X value
rand(), rand(INT seed) DOUBLE It will fetch and returns a random number that is distributed uniformly from 0 to 1

Conditional Functions

These functions used for conditional values checks.

Function Name Return Type Description
if(Boolean testCondition, T valueTrue, T valueFalseOrNull) T It will fetch and gives value True when
Test Condition is of true,
gives value False Or Null otherwise.
ISNULL( X) Boolean It will fetch and gives true if X is NULL and false otherwise.
ISNOTNULL(X ) Boolean It will fetch and gives true if X is not NULL and false otherwise.

String Functions

String manipulations and string operations these functions can be called.

Function Name Return Type Description
reverse(string X) string It will give the reversed string of X
rpad(string str, int length, string pad) string It will fetch and gives str, which is right-padded with pad to a length of length(integer value)
rtrim(string X) string It will fetch and returns the string resulting from trimming spaces from the end (right hand side) of X
For example, rtrim(‘ results ‘) results in ‘ results’
space(INT n) string It will fetch and gives a string of n spaces.
split(STRING str, STRING pat) array Splits str around pat (pat is a regular expression).
Str_to_map(text[, delimiter1, delimiter2]) map<String ,String> It will split text into key-value pairs using two delimiters.

UDFs (User Defined Functions)

In Hive, the users can define own functions to meet certain client requirements. These are known as UDFs in Hive. User Defined Functions written in Java for specific modules.

Some of UDFs are specifically designed for the reusability of code in application frameworks. The developer will develop these functions in Java and integrate those UDFs with the Hive.

During the Query execution, the developer can directly use the code, and UDFs will return outputs according to the user-defined tasks. It will provide high performance in terms of coding and execution.

For example, for string stemming we don’t have any predefined function in Hive. For this, we can write stem UDF in Java. Wherever we require Stem functionality, we can directly call this Stem UDF in Hive.

Here stem functionality means deriving words from its root words. It is like stemming algorithm reduces the words “wishing”, “wished”, and “wishes” to the root word “wish.” For performing this type of functionality, we can write UDF in Java and integrate it with Hive.

Depending on the use cases, the UDFs can be written. It will accept and produce different numbers of input and output values.

The general type of UDF will accept a single input value and produce a single output value. If the UDF is used in the query, then UDF will be called once for each row in the result data set.

In the other way, it can accept a group of values as input and return a single output value as well.