# Factor in R: Categorical Variable & Continuous Variables

## What is Factor in R?

Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels. Factor is mostly used in Statistical Modeling and exploratory data analysis with R.

In a dataset, we can distinguish two types of variables: categorical and continuous.

• In descriptive statistics for categorical variables in R, the value is limited and usually based on a particular finite group. For example, a categorical variable in R can be countries, year, gender, occupation.
• A continuous variable, however, can take any values, from integer to decimal. For example, we can have the revenue, price of a share, etc..

## Categorical Variables

Categorical variables in R are stored into a factor. Let’s check the code below to convert a character variable into a factor variable in R. Characters are not supported in machine learning algorithm, and the only way is to convert a string to an integer.

Syntax

`factor(x = character(), levels, labels = levels, ordered = is.ordered(x))`

Arguments:

• x: A vector of categorical data in R. Need to be a string or integer, not decimal.
• Levels: A vector of possible values taken by x. This argument is optional. The default value is the unique list of items of the vector x.
• Labels: Add a label to the x categorical data in R. For example, 1 can take the label `male` while 0, the label `female`.
• ordered: Determine if the levels should be ordered in categorical data in R.

Example:

Let’s create a factor data frame.

```# Create gender vector
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
class(gender_vector)
# Convert gender_vector to a factor
factor_gender_vector <-factor(gender_vector)
class(factor_gender_vector)
```

Output:

```## [1] "character"
## [1] "factor"```

It is important to transform a string into factor variable in R when we perform Machine Learning task.

A categorical variable in R can be divided into nominal categorical variable and ordinal categorical variable.

### Nominal Categorical Variable

A categorical variable has several values but the order does not matter. For instance, male or female. Categorical variables in R does not have ordering.

```# Create a color vector
color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
# Convert the vector to factor
factor_color <- factor(color_vector)
factor_color
```

Output:

```## [1] blue   red    green  white  black  yellow
## Levels: black blue green red white yellow```

From the factor_color, we can’t tell any order.

### Ordinal Categorical Variable

Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.

Example:

We can use summary to count the values for each factor variable in R.

```# Create Ordinal categorical vector
day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
# Print the new variable
factor_day
```

Output:

```## [1] evening   morning   afternoon midday
midnight  evening```

Example:

```## Levels: morning < midday < afternoon < evening < midnight
# Append the line to above code
# Count the number of occurence of each level
summary(factor_day)
```

Output:

```##   morning    midday afternoon   evening  midnight
##         1         1         1         2         1```

R ordered the level from ‘morning’ to ‘midnight’ as specified in the levels parenthesis.

## Continuous Variables

Continuous class variables are the default value in R. They are stored as numeric or integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers information on different types of car. We can import it by using mtcars and check the class of the variable mpg, mile per gallon. It returns a numeric value, indicating a continuous variable.

```dataset <- mtcars
class(dataset\$mpg)```

Output

`## [1] "numeric"`