# Factors in R Tutorial

Learn about the factor function in R, along with an example, and it's structure, order levels, renaming of the levels, and finally, with the ordering of categorical values.
Jun 2020  · 5 min read The factors are the variable in R, which takes the categorical variable and stores data in levels. The primary use of this function can be seen in data analysis and specifically in statistical analysis. Also, it helps to reduce data redundancy and to save a lot of space in the memory.

Note: A categorical variable is those variables that take values based on the labels or names. For example, the blood type of a human can be A, B, AB, or O.

## factor function

Usage: Categorize the data which have less number of values.
Parameters:factor(v):v can be vector of values.

Let's see the example of a factor in action.

You can see below code where there are two categorical variables, namely "Male" and "Female", also called factor values.

``````gender <- c("Male","Female","Female","Male","Female")
``````

Let's create a factor for the gender where 'factor(gender)' is used and saved to a variable called 'gender.factor'.

``````gender.factor <- factor(gender)
gender.factor
``````
1. Male
2. Female
3. Female
4. Male
5. Female
Levels:
1. 'Female'
2. 'Male'

The above code gives the output as below:

``````Male Female Female Male Female
Levels:
'Female' 'Male'``````

You can see above where values are printed the same as the input vector. Additionally, 'Levels', which are 'Female' and 'Male' are sorted alphabetically.

## Structure of factor function

Let's examine the structure for factor function by using 'str(gender.factor)' in the code below.

``````str(gender.factor)
``````
`` Factor w/ 2 levels "Female","Male": 2 1 1 2 1 ``
``Factor w/ 2 levels "Female","Male": 2 1 1 2 1``

The above output shows that there is a factor of 2 levels. factor converts the character vector as gender into a vector of integer values. "Female" is the first level encoded as 1 whereas the "Male" is the second level, encoded as 2.

Also, the primary purpose of encoding from character to numeric is that the categories can belong, repeating is redundant, which can take a lot of space in the memory. But, using factor reduces all the burden to save up space in the memory.

## Changing Order Levels

Let's change the order levels, so the levels of "Female" will become 2 and "Male" as 1.

Let's make a new factor for the gender by changing the levels of "Male" and "Female' by passing it as a vector input to the "levels". Finally, the resultant output is saved to the variable named "gender.factor2'.

``````gender.factor2 <- factor(gender,levels=c("Male","Female"))
gender.factor2
str(gender.factor2)
``````
1. Male
2. Female
3. Female
4. Male
5. Female
Levels:
1. 'Male'
2. 'Female'
`` Factor w/ 2 levels "Male","Female": 1 2 2 1 2``

The 'gender.factor2' is printed along with it's structure printed using 'str(gender.factor2)' where the following changes can be seen.

``````Male Female Female Male Female
Levels:
'Male' 'Female'
Factor w/ 2 levels "Male","Female": 1 2 2 1 2``````

The above code gives the output where the encoding of "Male" is 1, and "Female" is 2. It's different from 'gender.factor', which was opposite in the above code.

## Renaming a Factor levels

Let's change the name of the vector values in the input by specifying the regular use of 'levels' as the first argument with values "Male" and "Female" and the expected changed vector values using 'labels' as the second argument with "Gen_Male" and "Gen_Female" respectively.

``````factor(gender,levels = c("Male","Female"),labels = c("Gen_Male","Gen_Female"))
``````
1. Gen_Male
2. Gen_Female
3. Gen_Female
4. Gen_Male
5. Gen_Female
Levels:
1. 'Gen_Male'
2. 'Gen_Female'
``````Gen_Male Gen_Female Gen_Female Gen_Male Gen_Female
Levels:
'Gen_Male' 'Gen_Female'``````

The above code gives the output where the name is changed for "Male" to "Gen_Male" and "Female" to "Gen_Female".

## Ordering a Categorical Variable

Let's look at a different example when dealing with ordinal categorical values where ordered matters. For instance, for the size of a pant, there might be a size which is considered as Large as "L", Extra Large as "XL" and Extra extra Large as "XXL" is arranged in ascending order. The code below contains the collection of vector input of characters "L", "XL" and "XXL" and stored to 'pant'. 'pant.factor' is the variable which has parameter containing levels arranged in ascending order as 'levels = c("L", "XL", "XXL")' and finally 'ordered = TRUE', which makes the sorting possible according to your need.

``````pant <- c("XL","L","XL","XXL","L","XL")
pant.factor <- factor(pant,ordered = TRUE,levels = c("L","XL","XXL"))
pant.factor
pant.factor > pant.factor
``````
1. XL
2. L
3. XL
4. XXL
5. L
6. XL
Levels:
1. 'L'
2. 'XL'
3. 'XXL'

TRUE

``````XL L XL XXL L XL
Levels:'L'< 'XL' < 'XXL'
TRUE``````

The above output shows the normal output at first where all the vector values (XL, L, XL, XXL, L, XL) are printed out by using 'pant.factor'. The levels of vector values 'L' < 'XL' < 'XXL' are arranged in the ascending order, which is printed at the console. Also, 'pant.factor > pant.factor' compares whether "XL" is greater than "L", which results in TRUE being printed.

## Congratulations

Congratulations, you have made it to the end of this tutorial!

In this tutorial, you have covered the factor function in R, along with an example, and its structure, order levels, renaming of the levels, and finally, with the ordering of categorical values.

Check out out tutorial on Using Functions in R.

R Courses

### .css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to R

BeginnerSkill Level
4 hr
2.6M
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See Details
Start Course

### Intermediate R

BeginnerSkill Level
6 hr
570.2K
Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

### Factor Analysis in R

BeginnerSkill Level
4 hr
9.5K
Explore latent variables, such as personality, using exploratory and confirmatory factor analyses.
See More
Related  ### DataCamp Portfolio Challenge: Win \$500 Publishing Your Best Work

Win up to \$500 by building a free data portfolio with DataCamp Portfolio. DataCamp Team

5 min  ### Building Diverse Data Teams with Tracy Daniels, Head of Insights and Analytics at Truist

Tracy and Richie discuss the best way to approach DE & I in data teams and the positive outcomes of implementing DEI correctly.  ### Making Better Decisions using Data & AI with Cassie Kozyrkov, Google's First Chief Decision Scientist

Richie speaks to Google's first Chief Decision Scientist and CEO of Data Scientific, Cassie Kozyrkov, covering decision science, data and AI.

### Chroma DB Tutorial: A Step-By-Step Guide

With Chroma DB, you can easily manage text documents, convert text to embeddings, and do similarity searches.

### Introduction to Non-Linear Models and Insights Using R

Uncover the intricacies of non-linear models in comparison to linear models. Learn about their applications, limitations, and how to fit them using real-world data sets.

Somil Asthana

17 min

### Visualizing Climate Change Data with ggplot2: A Step-by-Step Tutorial

Learn how to use ggplot2 in R to create compelling visualizations of climate change data. This step-by-step tutorial teaches you to find, analyze, and visualize historical weather data.

Bruno Ponne

11 min

See MoreSee More