Direkt zum Inhalt

Confirmatory Factor Analysis: A Guide to Testing Constructs

Understand how CFA tests theoretical models by linking observed indicators to latent constructs. Learn the steps, assumptions, and extensions that make CFA essential in measurement validation and structural equation modeling.
16. Dez. 2025  · 9 Min. lesen

When we answer questionnaires with statements such as "I am satisfied with our work" or "I like working with my colleagues," we are helping researchers gather information on concepts that we cannot measure directly. Take, for example, job satisfaction, motivation, or anxiety. These are what researchers call latent constructs. These are abstract concepts that we cannot directly measure, but can view indirectly in response, behavior, or test items.

But the issue is whether these questions measure the concept that is being measured, and not something else.

This is where confirmatory factor analysis (CFA) comes into the picture. CFA is a statistical technique that verifies if there is a relationship between observed variables (think of questions on a survey) and unobserved constructs (for example, motivation). As we proceed further and understand CFA in depth, it is important to differentiate it from exploratory factor analysis (EFA). Unlike EFA, which searches for patterns without hypothesizing, CFA begins with a theory and verifies if the data validate it.

As a critical part of structural equation modeling (SEM), CFA ensures that before we investigate relationships between concepts, the concepts themselves are measured reliably and validly.

What Is Confirmatory Factor Analysis (CFA)?

CFA considers the following question: Does my measurement model fit reality?

To be able to better understand this question, let’s understand what a measurement model is. It is a map that links observed indicators (like survey items) to latent constructs (like depression, motivation, or satisfaction). In CFA, researchers prespecify this map ahead of time, based on theory or past research, and then test whether the data fit that structure.

Comparing CFA and EFA with the help of an analogy, EFA is like navigating around a new city without a map. While CFA is equivalent to checking whether our GPS directions are consistent with the actual streets.

Key Components of CFA

Latent constructs and observed indicators

  • Latent constructs are conceptual notions that can’t be directly measured, e.g., intelligence, burnout, and happiness.
  • Observed indicators are what we use to make measurements with, such as test scores, survey items, and behavior ratings.

To measure job satisfaction, we might use the following:

  • I feel satisfied with my work.
  • I am satisfied with my pay.
  • I have a good working relationship with my colleagues.

These different responses are all reflecting the same underlying factor, i.e., job satisfaction.

Factor loadings

Factor loadings tell us to what degree each indicator is picking up its underlying construct. High loadings, typically over 0.7, tell us strong representation, while moderate loadings between 0.4 to 0.7 are adequate for most cases.

Think of factor loadings like signal strength on our mobile phone. The stronger the signal, the better the indicator of the construct.

Measurement model

The measurement model specifies which observed variables correspond to which latent constructs, as per the theory. Unlike in EFA, where the data dictates, CFA imposes this structure in advance and is therefore confirmatory rather than exploratory.

The CFA Process

Having introduced the core concepts, let’s walk through the CFA process step by step. This Python demo uses the semopy package.

Step 1: Model specification

The first step in CFA is defining the theoretical model. Researchers decide which latent constructs exist and how they are observed through indicators.

Suppose we are conducting a workplace psychology study. We wish to measure two constructs:

  • Job Satisfaction (JobSat), as measured by three survey items:

    • JS1: I feel satisfied with my work.

    • JS2: I am satisfied with my pay.

    • JS3: I have good relationships with colleagues.

  • Work Engagement (WorkEng), measured by three survey items:

    • WE1: I feel energetic at work.

    • WE2: I am enthusiastic about my job.

    • WE3: I get absorbed in my work.

We also expect that Job Satisfaction and Work Engagement are correlated.

In semopy, which is a dedicated Python library, this model can be expressed as:

model_desc = """
JobSat =~ JS1 + JS2 + JS3
WorkEng =~ WE1 + WE2 + WE3
JobSat ~~ WorkEng
"""

Where:

  • =~ defines what observed items load on a specific latent factor.

  • ~~ defines a correlation between two latent factors.

Step 2: Data collection

CFA requires relatively large samples for good estimation. One frequently used rule is a minimum of 200 subjects or a minimum of 10 measurements per parameter estimated.

To illustrate the process, let’s work with a toy dataset:

import pandas as pd
df = pd.DataFrame({
    "JS1": [3, 4, 5, 2, 4, 5, 3, 4],
    "JS2": [4, 5, 4, 3, 5, 4, 3, 5],
    "JS3": [2, 3, 4, 2, 3, 4, 2, 3],
    "WE1": [5, 4, 5, 3, 4, 5, 4, 5],
    "WE2": [4, 4, 5, 2, 3, 5, 3, 4],
    "WE3": [3, 5, 4, 3, 4, 5, 3, 4]
})

In an actual study, the dataset would include hundreds of survey responses.

Step 3: Model estimation

Now that we have our data and model, it’s time to make parameter estimations. Estimation provides the factor loadings that signify the strength of the relationship between constructs and items, along with other parameters.

The most popular estimation method is Maximum Likelihood (ML). It assumes that the observed data are continuous and follow a multivariate normal distribution. For scenarios where this assumption does not hold, such as the one with skewed or categorical data, estimators like Weighted Least Squares (WLS) are recommended.

In Python, using semopy:

from semopy import Model
mod = Model(model_desc)
mod.fit(df)

This fits the CFA model to the data, estimating factor loadings, correlations, and variances.

Step 4: Model fit evaluation

Once the model is estimated, the next step is to assess whether this model fits the data well.

Fit is assessed using statistical indices:

  • Chi-square (χ²): A non-significant value indicates a good fit, but it is very sensitive to sample size.
  • RMSEA (< 0.06): Lower values indicate better approximate fit.
  • CFI (> 0.95): Compares the target model with a baseline model.
  • SRMR (< 0.08): Estimates the mean differences between predicted and observed correlations.

In Python:

from semopy import calc_stats
stats = calc_stats(mod)

print("Chi-square:", stats.get('chi2'))
print("Degrees of Freedom:", stats.get('df'))
print("CFI:", stats.get('cfi'))
print("RMSEA:", stats.get('rmsea'))
print("SRMR:", stats.get('srmr'))

This output shows whether the theoretical model matches the observed data. If indices are within recommended thresholds, the model is considered a good fit.

Step 5: Model refinement

In cases where the fit is poor, researchers often inspect modification indices. These indices suggest how model fit might improve if certain parameters, such as error covariances between specific items or additional factor loadings, were freed for estimation.

However, it is important to pay attention that modifications must be guided by theory and not just by statistics. Otherwise, the model may overfit on one dataset very well but fail in others.

Estimated parameters (e.g., factor loadings) can be inspected as shown below:

estimates = mod.inspect()
print(estimates[['lval', 'op', 'rval', 'Estimate']])

It shows the extent to which each item is loading on its underlying construct. Items with low loading (< 0.4) can be questioned regarding their reliability in the measurement of the true score, and could become candidates for removal or revision.

Chi-square: Value    7.086071
Name: chi2, dtype: float64
DF: None
p-value: None
RMSEA: None
CFI: None
SRMR: None

Parameter estimates:
       lval  op     rval  Estimate  Std. Err    z-value   p-value
0       JS1   ~   JobSat  1.000000         -          -         -
1       JS2   ~   JobSat  0.991985  0.079766  12.436258       0.0
2       JS3   ~   JobSat  0.901155  0.074451  12.103973       0.0
3       WE1   ~  WorkEng  1.000000         -          -         -
4       WE2   ~  WorkEng  0.879609  0.083147  10.578944       0.0
5       WE3   ~  WorkEng  0.758832  0.072321  10.492585       0.0
6    JobSat  ~~  WorkEng -0.014492  0.017919  -0.808725  0.418674
7    JobSat  ~~   JobSat  0.283181  0.033256   8.515047       0.0
8   WorkEng  ~~  WorkEng  0.332945  0.042414   7.849889       0.0
9       JS1  ~~      JS1  0.182918  0.022465   8.142378       0.0
10      JS2  ~~      JS2  0.215892  0.023358   9.242882       0.0
11      JS3  ~~      JS3  0.293970    0.0243  12.097738       0.0
12      WE1  ~~      WE1  0.225318  0.030959   7.277931       0.0
13      WE2  ~~      WE2  0.304496  0.028694  10.611756       0.0
14      WE3  ~~      WE3  0.269805  0.023279  11.590123       0.0

A Quick Example in Context

The CFA results indicate that Job Satisfaction loads strongly on JS1 (1.00) and JS2 (0.992), and moderately lower on JS3 (0.901). This suggests that all three survey questions contribute meaningfully to measuring Job Satisfaction. None of the items appear problematic, as all the factor loadings are significantly higher than the standard cut-off of 0.7.

Similarly, Work Engagement loads strongly on WE1 (1.00) and WE2 (0.880), with slightly lower loading for WE3 (0.759), though still acceptable.

Job Satisfaction is highly unrelated to WorkEng (-0.014), which indicates that the constructs are fairly independent of each other in this sample.

Based on these results, our approach is:

  • Keep all three items for Job Satisfaction (JS1, JS2, JS3), since all of them load highly on the construct.

  • Retain all three items for Work Engagement (WE1, WE2, WE3), as each shows meaningful loadings.

  • There are no substantive changes required, since factor loadings are high and all estimated parameters are significant.

This confirms that the measurement model is working as intended. The observed indicators reliably reflect their latent constructs.

CFA Requirements and Assumptions

CFA relies on several key assumptions for results to be valid and interpretable. Understanding these assumptions helps gauge cases when CFA is appropriate and how to respond if data aren't completely consistent with them.

Multivariate normality

CFA typically uses Maximum Likelihood (ML) estimation, which assumes that the observed variables follow a multivariate normal distribution.

However, when the responses are highly skewed or categorical, this assumption does not hold. In such cases, the factor loadings, standard errors, and fit indices may be biased. To address this, alternative estimation methods like Weighted Least Squares (WLS) or a robust approach like Satorra-Bentler correction are used which do not require strict normality.

Adequate sample size

CFA involves estimating multiple parameters (factor loadings, variances, covariances). Small sample sizes can lead to unstable estimates and unreliable conclusions.

A common guideline suggests at least 200 participants or 10 observations per estimated parameter. The greater the sample size, the more precise and generalizable the results.

Correct model specification

CFA tests a pre-specified theoretical model. If the model is misspecified, e.g., assigns indicators to the wrong latent factor, CFA cannot correct it.

Random sampling

The data should ideally come from a random sample so that the findings are generalizable beyond the dataset used in the study.

Non-random or biased samples may produce results that reflect anomalies of the sample and not the idea being studied.

Bayesian CFA is also extremely flexible and accommodates small samples, complex models, or non-normal data according to a priori knowledge.

Comparison with Exploratory Factor Analysis (EFA)

Let’s revisit the comparison between EFA and CFA. EFA is data-driven, which uncovers latent structures without prior assumptions. All factor loadings are freely estimated, which allows the data to “speak for itself.” CFA is theory-driven and tests pre-specified hypothesized structures with constrained loadings.

Researchers typically use the two methods in sequence, where EFA is used to explore potential structures and CFA helps confirm them. This approach provides scope for both empirical discovery and theoretical validation.

Advanced Topics and Extensions in CFA

Multilevel and longitudinal CFA

  • Multilevel CFA is designed to account for nested data, for example, students within classrooms.
  • Longitudinal CFA examines how constructs evolve, which helps in assessing measurement stability, change, and temporal invariance.

Second-order and bifactor models

  • Second-order CFA models relationships among latent variables by modeling first-order factors, for example, verbal, spatial, and numerical abilities as indicators of a higher-order construct, such as general intelligence.
  • Bifactor models separate the variance attributed to general factors from that due to specific sub-dimensions.

Bayesian CFA

Bayesian CFA is a flexible alternative to traditional methods. It includes prior distributions in parameter estimation that enhance its model stability and make it useful for small samples, complex models, or non-normal data.

Applications of CFA

CFA is widely used for:

  • Scale validation to verify that a set of observed items accurately reflects the theoretical construct it is intended to measure, e.g., a new anxiety questionnaire.
  • Testing theoretical models to evaluate whether the structure of relationships among latent constructs conforms to theoretical expectations.
  • Comparison across groups to test for measurement invariance. For example, it helps in determining whether a satisfaction scale works the same across cultures or genders.
  • Refining and improving psychometric tools by identifying weak or redundant items. It improves the reliability and validity of tests and surveys.

Limitations and Challenges

Having discussed CFA in depth, it is important to recognize that it comes with certain limitations. Its effectiveness hinges on the following factors:

  • Quality of the underlying theory: Weak or poorly specified theoretical foundations can lead to models with poor fit.
  • Assumptions: Non-normal data distributions or small sample sizes can compromise estimates, which in turn impacts interpretability. 
  • Risk of overfitting, which results from excessively modifying models to achieve a better fit. Such tweaks add correlated errors that may work for a single dataset, but fail to generalize elsewhere.

Conclusion

Confirmatory factor analysis helps bridge theory and data, making it possible to accurately measure the unobservables across psychology, education, marketing, or organizational research. It provides a framework for validating latent constructs and laying down strong foundations for measurement. As part of structural equation modeling, CFA continues to evolve as new developments like multilevel, longitudinal, and Bayesian extensions take place.


Vidhi Chugh's photo
Author
Vidhi Chugh
LinkedIn

I am an AI Strategist and Ethicist working at the intersection of data science, product, and engineering to build scalable machine learning systems. Listed as one of the "Top 200 Business and Technology Innovators" in the world, I am on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.

FAQs

What is confirmatory factor analysis (CFA) and how does it differ from exploratory factor analysis (EFA)?

CFA is a statistical technique to test whether a hypothesized factor structure fits the observed data, while EFA explores potential structures without pre-specified models.

What are latent constructs and observed indicators in CFA?

Latent constructs are unobservable concepts like motivation or job satisfaction. Observed indicators are measurable items, such as survey questions or test scores, that reflect these latent constructs.

What are the key assumptions of CFA?

CFA assumes a correctly specified measurement model, multivariate normality (for ML estimation), adequate sample size, and ideally, randomly sampled data.

How do I evaluate if my CFA model fits the data?

Model fit is assessed using indices such as Chi-square, RMSEA, CFI, and SRMR. Acceptable thresholds indicate whether the theoretical model is consistent with the observed data.

How can CFA be implemented in Python?

CFA can be implemented in Python using packages like semopy. The article provides a step-by-step demo showing model specification, estimation, and interpretation of factor loadings and fit indices.

Themen

Learn with DataCamp

Kurs

Structural Equation Modeling with lavaan in R

4 Std.
9.8K
Learn how to create and assess measurement models used to confirm the structure of a scale or questionnaire.
Details anzeigenRight Arrow
Kurs starten
Mehr anzeigenRight Arrow
Verwandt

Tutorial

Understanding Covariance: An Introductory Guide

Discover how covariance reveals relationships between variables. Learn how to calculate and interpret it across statistics, finance, and machine learning.
Josef Waples's photo

Josef Waples

Tutorial

Structural Equation Modeling: What It Is and When to Use It

Explore the types of structural equation models. Learn how to make theoretical assumptions, build a hypothesized model, evaluate model fit, and interpret the results in structural equation modeling.
Bunmi Akinremi's photo

Bunmi Akinremi

Tutorial

Introduction to Factor Analysis in Python

In this tutorial, you'll learn the basics of factor analysis and how to implement it in python.
Avinash Navlani's photo

Avinash Navlani

Tutorial

Characteristic Equation: Everything You Need to Know for Data Science

Understand how to derive the characteristic equation of a matrix and explore its core properties. Discover how eigenvalues and eigenvectors reveal patterns in data science applications. Build a solid foundation in linear algebra for machine learning.
Vahab Khademi's photo

Vahab Khademi

Tutorial

Fisher’s Exact Test: Making Decisions with Small Samples

Learn to analyze categorical relationships when your sample sizes are small.
Vidhi Chugh's photo

Vidhi Chugh

Tutorial

Z-Score: The Complete Guide to Statistical Standardization

Learn the mathematical foundations of z-scores, explore practical calculation methods, and discover applications across statistics and data science.
Vinod Chugani's photo

Vinod Chugani

Mehr anzeigenMehr anzeigen