Course Notes: Factor Analysis in R

Course Notes - EXPLORATORY FACTOR ANALYSIS in R

Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! The datasets used in this course are available in the datasets folder.

# Import any packages you want to use here

Psycho + metrics

Psychometrics is the study of unobservable variables - "psycho" meaning "of the mind" and "metrics" meaning "related to measurement." Researchers develop measures to capture unobservable variables, such as personality or IQ, and factor analysis is a valuable tool for use both during and after the development process. In Chapter 1, you'll learn how to examine the statistical properties of a measure designed around one construct.

Learning objectives In our first lesson, you'll start by running a unidimensional exploratory factor analysis, or EFA, on examinees' responses to questions, which we'll refer to as items. Next, you'll look at the results of the EFA to examine two key pieces of output. Items' factor loadings quantify their relationship to the underlying factor, which tells you how well each item is performing. Individuals' factor scores provide an estimate of the amount of the underlying factor each examinee possesses, which helps assign scores to examinees.
Factor Analysis' relationship to other analyses You may be familiar with some related ways of analyzing data. Factor analysis can be thought of as midway between classical test theory and structural equation modeling. Whereas classical test theory reports scores as the unweighted sum of item scores, factor analysis assigns item weights according to the correlation matrix. These correlations allow us to infer the presence of a latent variable or variables. Structural equation modeling extends this approach to model the relationships between latent variables.
Types of Factor Analysis It's important to note that there are two different types of factor analysis:
exploratory and
confirmatory.

Exploratory factor analysis is used during measure development to explore factor structure and determine which items do a good job of measuring the construct. Confirmatory factor analysis is used to validate a measure after development.

Package This course primarily uses the psych package, which was developed by William Revelle. You can load the package using the library() function.
Dataset You'll use the gcbs dataset in the first chapter. This dataset contains 2,495 responses to 15 multiple choice questions, or items, which are designed to test respondents' level of belief in conspiracies.
Item types Items in the gcbs dataset are categorized into five conspiracy facets. For example, Item 2, "The government permits or perpetrates acts of terrorism on its own soil, disguising its involvement," is a government malfeasance item. Item 8, "Evidence of alien contact is being concealed from the public," is an extraterrestrial coverup item.
Factor structure The 15 items are hypothesized to reflect five lower-order factors corresponding to their five types. These five factors share a single higher-order factor: conspiracist belief. Hierarchical factor structures like this require structural equation modeling to estimate, but exploratory and confirmatory factor analysis can estimate either a single-factor or five-factor structure.

Factor structure In Chapter 1, you'll ignore the five lower-order factors and use a single-factor EFA to estimate the items' relationship to conspiracist belief. This analysis will give you information about how well each item measures a single underlying factor and information about how much of the factor each examinee possesses. You'll learn how to deal with multiple factors in later chapters.
Using the fa() function The fa() function is your ticket to running EFAs in the psych package. The object created from this function contains lots of valuable information such as items' factor loadings, individuals' factor scores, and fit statistics. In the first lesson of this chapter, you'll learn how to use the fa() function to run a single-factor EFA, access and interpret its output, and diagram the results.
Let's practice! Now that we've covered the basic theory behind factor analysis let's get to some actual code!

Add your notes here

library(psych)

gcbs = readRDS('datasets/GCBS_data.rds')
head(gcbs)

EFA_model <- fa(gcbs)
fa.diagram(EFA_model)
EFA_model$loadings

Starting out with a unidimensional EFA

Let's begin by using the psych package and conducting a single-factor explanatory factor analysis (EFA). The fa() function conducts an EFA on your data. When you're using this in the real world, be sure to use a dataset that only contains item responses - other types of data will cause errors and/or incorrect results. In the gcbs dataset, these are examinees' responses to 15 items from the Generic Conspiracist Beliefs Scale, which is designed to measure conspiracist beliefs.

An EFA provides information on each item's relationship to a single factor hypothesized to be represented by each of the items. EFA results give you basic information about how well items relate to that hypothesized construct.

Be sure to save the analysis result object so you can return to it later.

Load the psych package to gain access to the necessary functions for your exploratory factor analysis.
Then, run a single-factor EFA on the gcbs dataset and save the result to an object named EFA_model.
Finally, call the EFA_model object to see how the items in the dataset relate to the extracted factor.

# Load the psych package
library(psych)
 
# Conduct a single-factor EFA
EFA_model <- fa(gcbs)

# View the results
EFA_model

You now know how to conduct a single-factor EFA, which tells you each variable's relationship to the factor of interest. You can see in the results that the function has named the factor MR1. This name is due to it being the first factor extracted using minimum residual estimation.

Viewing and visualizing the factor loadings

Each fa() results object is actually a list, and each element of the list contains specific information about the analysis, including factor loadings. Factor loadings represent the strength and directionality of the relationship between each item and the underlying factor, and they can range from -1 to 1.

You can also create a diagram of loadings. The fa.diagram() function takes a result object from fa() and creates a path diagram showing the items’ loadings ordered from strongest to weakest. Path diagrams are more common for structural equation modeling than for factor analysis, but this type of visualization can be a helpful way to represent your results.

View the items' factor loadings by accessing the loadings element of the results object. These values show the strength and direction of their relationships.
Then, visualize the EFA results in a path diagram.

# Set up the single-factor EFA
EFA_model <- fa(gcbs)

# View the factor loadings
EFA_model$loadings

# Create a path diagram of the items' factor loadings
fa.diagram(EFA_model)

Interpreting individuals' factor scores

The EFA_model object also contains a named list element, scores, which contains factor scores for each person. These factor scores are an indication of how much or how little of the factor each person is thought to possess. Factor scores are not computed for examinees with missing data.

Use rowSums() to see the total scores for the first six respondents. These values tell you how much of the construct they possess.
Use head() to look at the first few lines of the response data and their sum scores. Comparing these helps illustrate the relationship between responses and factor scores.
To get a feel for how the factor scores are distributed, use summary() to look at summary statistics.
Use plot() and density() to create a density plot of the estimated factor scores for a visual representation. Density plots show the distribution of data over a continuous interval and can give you a sense of what your data look like.

# Take a look at the first few lines of the response data and their corresponding sum scores
head(gcbs)
rowSums(head(gcbs))

# Then look at the first few lines of individuals' factor scores
head(EFA_model$scores)

# To get a feel for how the factor scores are distributed, look at their summary statistics and density plot.
summary(EFA_model$scores)

plot(density(EFA_model$scores, na.rm = TRUE), 
    main = "Factor Scores")