Skip to main content

Course

Inference for Categorical Data in R

AdvancedSkill Level

4.8+

Updated 12/2021

In this course you'll learn how to leverage statistical techniques for working with categorical data.

Start Course for Free

RProbability & Statistics

4 hr

14 videos

53 Exercises

4,000 XP

10,694

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

Categorical data is all around us. It's in the latest opinion polling numbers, in the data that lead to new breakthroughs in genomics, and in the troves of data that internet companies collect to sell products to you. In this course you'll learn techniques for parsing the signal from the noise; tools for identifying when structure in this data represents interesting phenomena and when it is just random noise.

Prerequisites

Foundations of Inference in R

1

Inference for a single parameter

In this chapter you will learn how to perform statistical inference on a single parameter that describes categorical data. This includes both resampling based methods and approximation based methods for a single proportion.

The General Social Survey

Exploring consci

Generating via bootstrap

Constructing a CI

Why more bootstraps?

Interpreting a Confidence Interval

CIs and confidence level

SE with less data

SE with different p

The approximation shortcut

CI via approximation

Methods compared

2

Proportions: testing and power

This chapter dives deeper into performing hypothesis tests and creating confidence intervals for a single parameter. Then, you'll learn how to perform inference on a difference between two proportions. Finally, this chapter wraps up with an exploration of what happens when you know the null hypothesis is true.

Hypothesis test for a proportion

Life after death

Generating from H0

Testing a claim

Making a decision

Intervals for differences

Death penalty and sex

Hypothesis test on the difference in proportions

Interpreting the test

Hypothesis tests and confidence intervals

Statistical errors

When the null is true

When the null is true: decision

3

Comparing many parameters: independence

This part of the course will teach you how to use both resampling methods and classical methods to test for the indepence of two categorical variables. This chapter covers how to perform a Chi-squared test.

Contingency tables

Politics and Space

Understanding contingency tables

From tidy to table to tidy

Chi-squared test statistic

A single permuted Chi-sq

Building two null distributions

Is the data consistent with the model?

Alternate method: the chi-squared distribution

Checking conditions

The geography of happiness

A p-value two ways

Intervals for the chi-squared distribution

4

Comparing many parameters: goodness of fit

The course wraps up with two case studies using election data. Here, you'll learn how to use a Chi-squared test to check goodness-of-fit. You'll study election results from Iran and Iowa and test if Benford's law applies to these datasets.

Case study: election fraud

Getting to know the Iran data

Breaking it down by province

Extracting the first digit I

Goodness of fit

Goodness of fit test

A p-value, two ways

Is this evidence of fraud?

And now to US

Getting to know the Iowa data

Extracting the first digit II

Testing Iowa

Fraud in Iowa?

Election fraud in Iran and Iowa: debrief

Inference for Categorical Data in R

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.8

from 107 reviews

84%

14%

1%

1%

0%

Sort by

Lina Marcela

6 days ago

Irvin Sinué

last week

ERNESTO

last week

n144020004

2 weeks ago

Mariana Zulema

2 weeks ago

Jose Antonio

4 weeks ago

Lina Marcela

Irvin Sinué

ERNESTO

FAQs

Is this course suitable for beginners?

No. This coursed is aimed at Advanced learners.

Will I receive a certificate at the end of the course?

Yes, once you have completed the course and met the requirements, you will receive a certificate verifying your accomplishment.

What topics are covered in the course?

The topics covered in the course include single proportion inference, testing and power, independence, and goodness of fit.

What software do I need to take the course?

This course is conducted in the programming language R and requires some prior knowledge of R syntax.

How long does the course take?

This course typically takes about 4 hours to complete.

Who will benefit from this course?

Analysts, marketing professionals, and anyone working with categorical data would benefit from the insights gained from this course. It can also be used to gain insights into genomics, opinion polling, and online products.

What is a Chi-squared test and how is it used?

A Chi-squared test is a statistical test used to determine if there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. It is used to compare the differences between categorical variables and analyse the strength of association between variables.

Join over 19 million learners and start Inference for Categorical Data in R today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.