Home RFoundations of Inference in R

Foundations of Inference in R

Learn how to draw conclusions about a population from a sample of data via a process known as statistical inference.

Start Course for Free

4 Hours17 Videos58 Exercises

33,728 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

One of the foundational aspects of statistical analysis is inference, or the process of drawing conclusions about a larger population from a sample of data. Although counter intuitive, the standard practice is to attempt to disprove a research claim that is not of interest. For example, to show that one medical treatment is better than another, we can assume that the two treatments lead to equal survival rates only to then be disproved by the data. Additionally, we introduce the idea of a p-value, or the degree of disagreement between the data and the hypothesis. We also dive into confidence intervals, which measure the magnitude of the effect of interest (e.g. how much better one treatment is than another).

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Introduction to ideas of inference
Free
In this chapter, you will investigate how repeated samples taken from a population can vary. It is the variability in samples that allow you to make claims about the population of interest. It is important to remember that the research claims of interest focus on the population while the information available comes only from the sample data.
Play Chapter Now
Welcome to the course!
50 xp
Hypotheses (1)
50 xp
Hypotheses (2)
50 xp
Randomized distributions
50 xp
Working with the NHANES data
100 xp
Calculating statistic of interest
100 xp
Randomized data under null model of independence
100 xp
Randomized statistics and dotplot
100 xp
Randomization density
100 xp
Using the randomization distribution
50 xp
Do the data come from the population?
100 xp
What can you conclude?
50 xp
Study conclusions
50 xp
2
Completing a randomization test: gender discrimination
In this chapter, you will gain the tools and knowledge to complete a full hypothesis test. That is, given a dataset, you will know whether or not is appropriate to reject the null hypothesis in favor of the research claim of interest.
Play Chapter Now
Example: gender discrimination
50 xp
Gender discrimination hypotheses
50 xp
Summarizing gender discrimination
100 xp
Step-by-step through the permutation
100 xp
Randomizing gender discrimination
100 xp
Distribution of statistics
50 xp
Reflecting on analysis
50 xp
Critical region
100 xp
Two-sided critical region
100 xp
Why 0.05?
50 xp
How does sample size affect results?
50 xp
Sample size in randomization distribution
100 xp
Sample size for critical region
100 xp
What is a p-value?
50 xp
Calculating the p-values
100 xp
Practice calculating p-values
100 xp
Calculating two-sided p-values
100 xp
Summary of gender discrimination
50 xp
3
Hypothesis testing errors: opportunity cost
You will continue learning about hypothesis testing with a new example and the same structure of randomization tests. In this chapter, however, the focus will be on different errors (type I and type II), how they are made, when one is worse than another, and how things like sample size and effect size impact the error rates.
Play Chapter Now
Example: opportunity cost
50 xp
Summarizing opportunity cost (1)
100 xp
Plotting opportunity cost
100 xp
Randomizing opportunity cost
100 xp
Summarizing opportunity cost (2)
100 xp
Opportunity cost conclusion
50 xp
Errors and their consequences
50 xp
Different choice of error rate
50 xp
Errors for two-sided hypotheses
50 xp
p-value for two-sided hypotheses: opportunity costs
100 xp
Summary of opportunity costs
50 xp
4
Confidence intervals
As a complement to hypothesis testing, confidence intervals allow you to estimate a population parameter. Recall that your interest is always in some characteristic of the population, but you only have incomplete information to estimate the parameter using sample data. Here, the parameter is the true proportion of successes in a population. Bootstrapping is used to estimate the variability needed to form the confidence interval.
Play Chapter Now
Parameters and confidence intervals
50 xp
What is the parameter?
50 xp
Hypothesis test or confidence interval?
50 xp
Bootstrapping
50 xp
Resampling from a sample
100 xp
Visualizing the variability of p-hat
100 xp
Always resample the original number of observations
50 xp
Variability in p-hat
50 xp
Empirical Rule
100 xp
Bootstrap t-confidence interval
100 xp
Bootstrap percentile interval
100 xp
Interpreting CIs and technical conditions
50 xp
Sample size effects on bootstrap CIs
100 xp
Sample proportion value effects on bootstrap CIs
100 xp
Percentile effects on bootstrap CIs
100 xp
Summary of statistical inference
50 xp

In the following tracks

Statistical Inference with R Statistician with R

Datasets

All polls Polling data Big discrimination dataset New discrimination dataset Small discrimination dataset

Collaborators

Nick Carchedi

Tom Jeon

Prerequisites

Introduction to Regression in R Hypothesis Testing in R

Jo Hardin

Professor at Pomona College

Jo Hardin is a professor of mathematics and statistics at Pomona College. Her statistical research focuses on developing new robust methods for high throughput data. Recently, she has also worked closely with the statistics education community on ways to integrate data science early into a statistics curriculum. When not working with students or on her research, she loves to put on a pair of running shoes and hit the road.

What do other learners have to say?

Join over 13 million learners and start Foundations of Inference in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Introduction to ideas of inference

Completing a randomization test: gender discrimination

Hypothesis testing errors: opportunity cost

Confidence intervals

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Foundations of Inference in R today!

Create Your Free Account

Training 2 or more people?

Join over 13 million learners and start Foundations of Inference in R today!