Skip to main content
HomeAbout RLearn R

Factor Levels in R

This tutorial takes course material from DataCamp's free Intro to R course and allows you to practice Factors.
Sep 2018  · 6 min read

If you want to take our free Intro to R course, here is the link.

Factor Levels

When you first get a data set, you will often notice that it contains factors with specific factor levels. However, sometimes you will want to change the names of these levels for clarity or other reasons. R allows you to do this with the function levels():

levels(factor_vector) <- c("name1", "name2",...)

A good illustration is the raw data that is provided to you by a survey. A common question for every questionnaire is the sex of the respondent. Here, for simplicity, just two categories were recorded, "M" and "F". (You usually need more categories for survey data; either way, you use a factor to store the categorical data.)

survey_vector <- c("M", "F", "F", "M", "M")

Recording the sex with the abbreviations "M" and "F" can be convenient if you are collecting data with pen and paper, but it can introduce confusion when analyzing the data. At that point, you will often want to change the factor levels to "Male" and "Female" instead of "M" and "F" for clarity.

Watch out: the order with which you assign the levels is important. If you type levels(factor_survey_vector), you'll see that it outputs [1] "F" "M". If you don't specify the levels of the factor when creating the vector, R will automatically assign them alphabetically. To correctly map "F" to "Female" and "M" to "Male", the levels should be set to c("Female", "Male"), in this order.

Instructions

  • Check out the code that builds a factor vector from survey_vector. You should use factor_survey_vector in the next instruction.
  • Change the factor levels of factor_survey_vector to c("Female", "Male"). Mind the order of the vector elements here.
# no pec survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) # Code to build factor_survey_vector survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) # Specify the levels of factor_survey_vector levels(factor_survey_vector) <- factor_survey_vector # Code to build factor_survey_vector survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) # Specify the levels of factor_survey_vector levels(factor_survey_vector) <- c("Female", "Male") factor_survey_vector msg = "Do not change the definition of survey_vector!" test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg) msg = "Do not change or remove the code to create the factor vector." test_function("factor", "x", not_called_msg = msg, incorrect_msg = msg) # MC-note: ideally would want to test assign operator <-, and have it highlight whole line. # MC-note: or negate this test_student_typed, to highlight where they type this incorrect phrase # test_student_typed('c("Male", "Female")') test_object("factor_survey_vector", eq_condition = "equal", incorrect_msg = paste("Did you assign the correct factor levels to factor_survey_vector? Use levels(factor_survey_vector) <- c(\"Female\", \"Male\"). Remember that R is case sensitive!")) success_msg("Wonderful! Proceed to the next exercise.")

Mind the order in which you have to type in the factor levels. Hint: look at the order in which the levels are printed when typing levels(factor_survey_vector).

Summarizing a Factor

After finishing this course, one of your favorite functions in R will be summary(). This will give you a quick overview of the contents of a variable:

summary(my_var)

Going back to our survey, you would like to know how many "Male" responses you have in your study, and how many "Female" responses. The summary() function gives you the answer to this question.

Instructions

Ask a summary() of the survey_vector and factor_survey_vector. Interpret the results of both vectors. Are they both equally useful in this case?

# Build factor_survey_vector with clean levels survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) levels(factor_survey_vector) <- c("Female", "Male") factor_survey_vector # Generate summary for survey_vector # Generate summary for factor_survey_vector # Build factor_survey_vector with clean levels survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) levels(factor_survey_vector) <- c("Female", "Male") factor_survey_vector # Generate summary for survey_vector summary(survey_vector) # Generate summary for factor_survey_vector summary(factor_survey_vector) msg = "Do not change anything about the first few lines that define survey_vector and factor_survey_vector." test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg) test_object("factor_survey_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) msg <- "Have you correctly used summary() to generate a summary for %s?" test_output_contains("summary(survey_vector)", incorrect_msg = sprintf(msg, "survey_vector")) test_output_contains("summary(factor_survey_vector)", incorrect_msg = sprintf(msg, "factor_survey_vector")) success_msg("Nice! Have a look at the output. The fact that you identified \"Male\" and \"Female\" as factor levels in factor_survey_vector enables R to show the number of elements for each category.")

Call the summary() function on both survey_vector and factor_survey_vector, it's as simple as that!

Ordered Factors

Since "Male" and "Female" are unordered (or nominal) factor levels, R returns a warning message, telling you that the greater than operator is not meaningful. As seen before, R attaches an equal value to the levels for such factors.

But this is not always the case! Sometimes you will also deal with factors that do have a natural ordering between its categories. If this is the case, we have to make sure that we pass this information to R...

Let us say that you are leading a research team of five data analysts and that you want to evaluate their performance. To do this, you track their speed, evaluate each analyst as "slow", "medium" or "fast", and save the results in speed_vector.

Instructions

As a first step, assign speed_vector a vector with 5 entries, one for each analyst. Each entry should be either "slow", "medium", or "fast". Use the list below:

  • Analyst 1 is medium,
  • Analyst 2 is slow,
  • Analyst 3 is slow,
  • Analyst 4 is medium and
  • Analyst 5 is fast.

No need to specify these are factors yet.

# Create speed_vector speed_vector <- # Create speed_vector speed_vector <- c("medium", "slow", "slow", "medium", "fast") test_object("speed_vector", incorrect_msg = "speed_vector should be a vector with 5 entries, one for each analyst's speed rating. Don't use capital letters; R is case sensitive!") success_msg("A job well done! Continue to the next exercise.")

Assign to speed_vector a vector containing the character strings "slow", "medium", or "fast".


If you want to learn more from this course, here is the link.

Check out out Factors in R Tutorial.

Topics

R Courses

Certification available

Course

Introduction to R

4 hr
2.7M
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Navigating R Certifications in 2024: A Comprehensive Guide

Explore DataCamp's R programming certifications with our guide. Learn about Data Scientist and Data Analyst paths, preparation tips, and career advancement.
Matt Crabtree's photo

Matt Crabtree

8 min

Data Sets and Where to Find Them: Navigating the Landscape of Information

Are you struggling to find interesting data sets to analyze? Do you have a plan for what to do with a sample data set once you’ve found it? If you have data set questions, this tutorial is for you! We’ll go over the basics of what a data set is, where to find one, how to clean and explore it, and where to showcase your data story.
Amberle McKee's photo

Amberle McKee

11 min

You’re invited! Join us for Radar: The Analytics Edition

Join us for a full day of events sharing best practices from thought leaders in the analytics space
DataCamp Team's photo

DataCamp Team

4 min

10 Top Data Analytics Conferences for 2024

Discover the most popular analytics conferences and events scheduled for 2024.
Javier Canales Luna's photo

Javier Canales Luna

7 min

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

Mastering Bayesian Optimization in Data Science

Unlock the power of Bayesian Optimization for hyperparameter tuning in Machine Learning. Master theoretical foundations and practical applications with Python to enhance model accuracy.
Zoumana Keita 's photo

Zoumana Keita

11 min

See MoreSee More