T-tests in R Tutorial: Learn How to Conduct T-Tests

Determine if there is a significant difference between the means of the two groups using t.test() in R.

Mar 14, 2023 · 10 min read

Introduction

Suppose you have two groups of sales teams and you want to check if the average number of mobile phones sold in a week by both teams is the same or not. How will you compare the performance?

You will take the average number of mobile phones sold to 200 random customers by respective teams and determine the difference. The first marketing team, on average, has sold 120 phones, whereas the second team has sold 80.

So, it is clear that the first team has performed better in sales than the second team. Right? We can’t be sure; the dataset is collected from random customers and does not represent all of the people who bought the phone in that week.

So how do we determine which team performed better? We will use a t-test to understand if the difference between the two averages is real or just random luck.

The t-test is a statistical hypothesis that takes samples from both groups to determine if there is a significant difference between the means of the two groups. How does it work? It compares both sample mean and standard deviations while considering sample size and the degree of variability of the data.

In this tutorial, we will learn about the classification of t-tests (one-sample, two-samples, and paired sample t-test) with R code examples and learn to interpret the results.

Note: if you are new to R, take a mini Introduction to R Programming course to understand the basics.

Your Path to Mastering R

Start from scratch and build core R skills for data science.

Start Learning for Free

t.test() Function in R

R language provides us with a simple t.test built-in function for One Sample, Two Samples, and Paired t-tests.

There are two ways of using the t.test function: default and formula methods.

Default method

You provide numerical samples from the x group and y group, specifying the alternative hypothesis, hypothesized mean mu, and confidence level of the interval. Moreover, you can perform Paired t-test by toggling the paired argument and Two Sample t-test with equal variance by changing var.equal argument.

t.test(x, y,
      alternative = c("two.sided", "less", "greater"),
      mu = 0, paired = FALSE, var.equal = FALSE,
      conf.level = 0.95, ...)

Formula method

In this method, you provide the formula x~y, where x is a numeric vector or a column from the data, and y is a binary column containing the types of groups.

t.test(formula, data, subset, na.action, ...)

How to Perform One Sample t-test in R

One-sample t-test is the statistical hypothesis to test if there is a significant difference between the sample mean and hypothesis or assumed population mean. The test compares the sample mean to the hypothesis mean, while considering the variability in the data.

x̄₁ = Sample mean
μ = Hypothesized population mean
s = Sample standard deviation
n = Sample size

In this tutorial, we will be using Carbon Dioxide Uptake in Grass Plants R dataset for t-test code examples. The dataset has 84 rows and 5 columns, and it was collected from an experiment to test the cold tolerance of the grass species Echinochloa crus-galli. We will be mostly considering uptake, Treatment, and Type columns for our tests.

head(CO2)

In the example, we will be using the conc (carbon dioxide concentrations) column from the dataset.

We can observe the mean, distribution, and outliers using a boxplot.

boxplot(CO2$conc)

For a one-sample t-test, we will be using `t.test(x,mu=0)`. Where x is the variable, mu is set by the null hypothesis. In our case, it is 550.

t.test(CO2$conc, mu = 550)

Result:

The concentration of carbon dioxide is not equal to 550 and significantly lower than the hypothesized population mean.

One Sample t-test

data:  CO2$conc
t = -3.5617, df = 83, p-value = 0.0006134
alternative hypothesis: true mean is not equal to 550
95 percent confidence interval:
370.7805 499.2195
sample estimates:
mean of x
      435

How to Perform Two Sample t-test in R

In the Two Sample t-tests, we will be comparing carbon dioxide uptake rates of two treatment types: non-chilled and chilled.

We can visualize the distribution of two groups using a boxplot.

plot(uptake ~ Treatment, data=CO2)

Welch Two Sample t-test

It is a statistical hypothesis that investigates if there is a significant difference between the mean of two independent groups that may have unequal variance. The test is comparing the means of two groups while considering the variability within each group.

x̄₁ = Sample mean of first group
x̄₂ = Sample mean of second group
n₁ = Sample size of first group
n₂ = Sample size of second group
s₁² = Sample variance of first group
s₂² = Sample variance of second group

By default, the t.test() function assumes that the variance of two groups is unequal (var.equal=FALSE). So, we don’t have to make any changes.

We are using the formula method to obtain t-test results, where uptake is a numerical vector and Treatment is a binary category column of the CO2 dataset.

t.test(uptake ~ Treatment, data = CO2)

Result:

There is a significant difference in the means of the two groups, and the nonchilled group has higher uptake than the chilled group.

 Welch Two Sample t-test

data:  uptake by Treatment
t = 3.0485, df = 80.945, p-value = 0.003107
alternative hypothesis: true difference in means between group nonchilled and group chilled is not equal to 0
95 percent confidence interval:
  2.382366 11.336682
sample estimates:
mean in group nonchilled    mean in group chilled
                30.64286                 23.78333

Two Sample t-test with Equal Variance

The Two Sample t-test is a statistical hypothesis test to determine if there is a significant difference between the mean of two independent groups while assuming that the variance of the two groups is equal. The test compares the means of two groups while considering the variability within each group.

x̄₁ = Sample mean of first group
x̄₂ = Sample mean of second group
n₁ = Sample size of first group
n₂ = Sample size of second group
s_p = Pooled standard deviation

To perform Two Sample t-tests with equal variance, we have to set var.equal TRUE and run the test again with the same formula and dataset.

t.test(uptake ~ Treatment, data = CO2, var.equal = TRUE)

Result:

As we can see, we got almost similar results that there is a significant mean difference between the two groups.

Two Sample t-test

data:  uptake by Treatment
t = 3.0485, df = 82, p-value = 0.003096
alternative hypothesis: true difference in means between group nonchilled and group chilled is not equal to 0
95 percent confidence interval:
  2.38324 11.33581
sample estimates:
mean in group nonchilled    mean in group chilled
                30.64286                 23.78333

How to Perform Paired t-test in R

Paired t-test is a statistical hypothesis that is used to determine if there is a significant difference between the means of two related or paired samples. It calculates the t-test value by comparing the differences between the paired observations while considering the variability within the difference.

dࠡ = differences of mean in paired observations
s_d = differences of standard deviation of sample
n = number of pairs

To perform Paired t-test in R, we have to set paired argument TRUE and run the test again with the same formula and dataset.

t.test(uptake ~ Treatment, paired = TRUE, data = CO2)

Result:

There is a statistically significant difference between the means of the two groups while considering the t and p-value.

Paired t-test

data:  uptake by Treatment
t = 7.939, df = 41, p-value = 8.051e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
5.114589 8.604458
sample estimates:
mean difference
      6.859524

In the second example, we will factor the rate of uptake for two types of the same plant. One originated from Quebec, and another is from Mississippi.

plot(uptake ~ Type, data=CO2)

Let’s check the paired t-test results by replacing the Treatment with the type in the formula.

t.test(uptake ~ Type, paired = TRUE, data = CO2)

Result:

Again, there is a significant difference between the mean of the Quebec and Mississippi group.

 Paired t-test

data:  uptake by Type
t = 11.374, df = 41, p-value = 2.937e-14
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
10.41177 14.90727
sample estimates:
mean difference
      12.65952

Try the t-test in R DataLab Workbook. It comes with code sources and results. You can also duplicate the workbook and start practicing on different examples.

Note: A strong statistics foundation will serve you well, no matter which industry you belong to. Statistics is the backbone of modern AI, and you should start your journey by taking Statistics Fundamentals with R skill track.

How to Interpret t-test Results in R

We are generating the results, but what do df, p-value, alternative hypothesis, or sample estimates mean? In this section, we will learn how to interpret the t-test results in R.

Let’s start by creating two groups using the rnorm function and run the Two Sample t-tests.

set.seed(125)

group1 <- c(rnorm(100, mean = 24, sd = 3))
group2 <- c(rnorm(100, mean = 43, sd = 2.4))

t.test(group1, group2)

Output:

 Welch Two Sample t-test

data:  group1 and group2
t = -47.765, df = 179.99, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-19.51569 -17.96722
sample estimates:
mean of x mean of y
24.30063  43.04208

data: the data used in Two Sample t-test (group1 and group2)
t: t test-statistic. The negative t-value of -47.765 indicates that the group1 sample mean is significantly smaller than group2.
df: it is the degree of freedom associated with the t-test value.
p-value: indicates the statistical significance of the result. The p-value is 2.2e-16 which is lower than alpha (0.005), indicating that the probability of obtaining such a large difference between the two groups by chance is very small.
alternative hypothesis: we can set the alternative hypothesis. In our case, it was set to check if the true difference in means is not equal to zero.
95 percent confidence interval: 95% confident that the true population means the difference between the two groups lies within the range of -19.51569, -17.96722.
sample estimates: it tells us the sample means of each group where group1 and group2 are 24.30063 and 43.04208, respectively. It means that, on average, group2 has a higher value than group1.

There are two hypotheses for the t-test:

H₀: µ₁ = µ₂: the two population mean are equal.
H_A: µ₁ ≠µ₂: the two population means are not equal.

In conclusion, the results of the Welch Two Sample t-test suggest that there is strong evidence that there is a statistically significant difference between group1 and group2.

Conclusion

In this tutorial, we have learned about One Sample, Two Samples, and Paired t-tests with R programming examples and how to interpret the result.

The t-test is one of many statistical tools used in hypothesis testing, and if you want to learn everything about hypothesis testing, take an interactive Hypothesis Testing in R course. The course covers t-tests, ANOVA, proportion tests, and chi-square tests.

You can also go beyond and enroll in our Statistician with R career track to master the essential skills and land a job as a statistician.

Topics

Data Analysis

Tutorial

An Introduction to Python T-Tests

Learn how to perform t-tests in Python with this tutorial. Understand the different types of t-tests - one-sample test, two-sample test, paired t-test, and Welch’s test, and when to use them.

Vidhi Chugh

Tutorial

R Contingency Tables Tutorial

In this tutorial, you'll learn how to create contingency tables and how to test and quantify relationships visible in them.

Łukasz Deryło

Tutorial

Chi-Square Test in R: A Complete Guide

Learn how to create a contingency table and perform chi-square tests in R using the chisq.test() function. Discover practical applications and interpret results with confidence.

Arunn Thevapalan

Tutorial

T-test vs. Z-test: When to Use Each

Use t-tests when dealing with small samples or unknown variance, and Z-tests when samples are large and variance is known.

Arunn Thevapalan

Tutorial

Logistic Regression in R Tutorial

Discover all about logistic regression: how it differs from linear regression, how to fit and evaluate these models it in R with the glm() function and more!

Vidhi Chugh

code-along

A/B Testing in R

Compare the performance of two groups with this introduction to A/B testing in R

Arne Warnke

See More See More