Loved by learners at thousands of companies
Previously, you learned the fundamentals of both statistical inference and linear models; now, the next step is to put them together. This course gives you a chance to think about how different samples can produce different linear models, where your goal is to understand the underlying population model. From the estimated linear model, you will learn how to create interval estimates for the effect size as well as how to determine if the effect is significant. Prediction intervals for the response variable will be contrasted with estimates of the average response. Throughout the course, you'll gain more practice with the dplyr and ggplot2 packages, and you will learn about the broom package for tidying models; all three packages are invaluable in data science.
In the first chapter, you will understand how and why to perform inferential (instead of descriptive only) analysis on a regression model.Variability in regression lines50 xpRegression output: example I100 xpFirst random sample, second random sample100 xpSuperimpose lines100 xpResearch question50 xpRegression hypothesis50 xpVariability of coefficients50 xpOriginal population - change sample size100 xpHypothetical population - less variability around the line100 xpHypothetical population - less variability in x direction100 xpWhat changes the variability of the coefficients?50 xp
Simulation-based inference for the slope parameter
In this chapter you will learn about the ideas of the sampling distribution using simulation methods for regression models.Simulation-based Inference50 xpNull sampling distribution of the slope100 xpSE of the slope100 xpp-value100 xpInference on slope50 xpSimulation-based CI for slope50 xpBootstrapping the data100 xpSE method - bootstrap CI for slope100 xpPercentile method - bootstrap CI for slope100 xpInference from randomization and bootstrapped distributions50 xp
t-Based Inference For the Slope Parameter
In this chapter you will learn about how to use the t-distribution to perform inference in linear regression models. You will also learn about how to create prediction intervals for the response variable.Mathematical approximation50 xpHow do the theoretical results play a role?50 xpt-statistic100 xpWorking with R-output (1)100 xpWorking with R-output (2)100 xpComparing randomization inference and t-inference100 xpIntervals in regression50 xpCI using t-theory100 xpComparing randomization CIs and t-based CIs100 xpDifferent types of intervals50 xpConfidence intervals for the average response at specific values100 xpConfidence intervals for the average response for all observations100 xpPrediction intervals for the individual response100 xp
Technical Conditions in linear regression
Additionally, you will consider the technical conditions that are important when using linear models to make claims about a larger population.Technical conditions for linear regression50 xpViolation of LINE conditions (1)50 xpViolation of LINE conditions (2)50 xpUsing residuals (1)100 xpUsing residuals (2)100 xpWhy do we need the LINE assumptions?50 xpEffect of an outlier50 xpEstimation with and without outlier100 xpInference with and without outlier (t-test)100 xpInference with and without outlier (randomization)100 xpMoving forward when model assumptions are violated50 xpAdjusting for non-linear relationship100 xpAdjusting for non-constant errors100 xpAdjusting for non-normal errors100 xp
Building on Inference in Simple Linear Regression
This chapter covers topics that build on the basic ideas of inference in linear models, including multicollinearity and inference for multiple regression models.Inference on transformed variables50 xpTransformed model100 xpInterpreting transformed coefficients50 xpMulticollinearity50 xpLA Homes, multicollinearity (1)100 xpLA Homes, multicollinearity (2)100 xpLA Homes, multicollinearity (3)100 xpMultiple linear regression50 xpInference on coefficients100 xpInterpreting coefficients50 xpSummary50 xp
In the following tracksStatistical Inference
Professor at Pomona College
Jo Hardin is a professor of mathematics and statistics at Pomona College. Her statistical research focuses on developing new robust methods for high throughput data. Recently, she has also worked closely with the statistics education community on ways to integrate data science early into a statistics curriculum. When not working with students or on her research, she loves to put on a pair of running shoes and hit the road.