One of the primary goals of any scientist is to find patterns in data and build models to describe, predict, and extract insight from those patterns. The most fundamental of these patterns is a linear relationship between two variables. This course provides an introduction to exploring, quantifying, and modeling linear relationships in data, by demonstrating techniques such as least-squares, linear regression, estimatation, and bootstrap resampling. Here you will apply the most powerful modeling tools in the python data science ecosystem, including scipy, statsmodels, and scikit-learn, to build and evaluate linear models. By exploring the concepts and applications of linear models with python, this course serves as both a practical introduction to modeling, and as a foundation for learning more advanced modeling techniques and tools in statistics and machine learning.
Exploring Linear TrendsFree
We start the course with an initial exploration of linear relationships, including some motivating examples of how linear models are used, and demonstrations of data visualization methods from matplotlib. We then use descriptive statistics to quantify the shape of our data and use correlation to quantify the strength of linear relationships between two variables.Introduction to Modeling Data50 xpReasons for Modeling: Interpolation100 xpReasons for Modeling: Extrapolation100 xpReasons for Modeling: Estimating Relationships100 xpVisualizing Linear Relationships50 xpPlotting the Data100 xpPlotting the Model on the Data100 xpVisually Estimating the Slope & Intercept100 xpQuantifying Linear Relationships50 xpMean, Deviation, & Standard Deviation100 xpCovariance vs Correlation100 xpCorrelation Strength100 xp
Building Linear Models
Here we look at the parts that go into building a linear model. Using the concept of a Taylor Series, we focus on the parameters slope and intercept, how they define the model, and how to interpret the them in several applied contexts. We apply a variety of python modules to find the model that best fits the data, by computing the optimal values of slope and intercept, using least-squares, numpy, statsmodels, and scikit-learn.What makes a model linear50 xpTerms in a Model50 xpModel Components100 xpModel Parameters100 xpInterpreting Slope and Intercept50 xpLinear Proportionality100 xpSlope and Rates-of-Change100 xpIntercept and Starting Points100 xpModel Optimization50 xpResidual Sum of the Squares100 xpMinimizing the Residuals100 xpVisualizing the RSS Minima100 xpLeast-Squares Optimization50 xpLeast-Squares with `numpy`100 xpOptimization with Scipy100 xpLeast-Squares with `statsmodels`100 xp
Making Model Predictions
Next we will apply models to real data and make predictions. We will explore some of the most common pit-falls and limitations of predictions, and we evaluate and compare models by quantifying and contrasting several measures of goodness-of-fit, including RMSE and R-squared.Modeling Real Data50 xpLinear Model in Anthropology100 xpLinear Model in Oceanography100 xpLinear Model in Cosmology100 xpThe Limits of Prediction50 xpInterpolation: Inbetween Times100 xpExtrapolation: Going Over the Edge100 xpGoodness-of-Fit50 xpRMSE Step-by-step100 xpR-Squared100 xpStandard Error50 xpVariation Around the Trend100 xpVariation in Two Parts100 xp
Estimating Model Parameters
In our final chapter, we introduce concepts from inferential statistics, and use them to explore how maximum likelihood estimation and bootstrap resampling can be used to estimate linear model parameters. We then apply these methods to make probabilistic statements about our confidence in the model parameters.Inferential Statistics Concepts50 xpSample Statistics versus Population100 xpVariation in Sample Statistics100 xpVisualizing Variation of a Statistic100 xpModel Estimation and Likelihood50 xpEstimation of Population Parameters100 xpMaximizing Likelihood, Part 1100 xpMaximizing Likelihood, Part 2100 xpModel Uncertainty and Sample Distributions50 xpBootstrap and Standard Error100 xpEstimating Speed and Confidence100 xpVisualize the Bootstrap100 xpModel Errors and Randomness50 xpTest Statistics and Effect Size100 xpNull Hypothesis100 xpVisualizing Test Statistics100 xpVisualizing the P-Value100 xpCourse Conclusion50 xp
DatasetsFemur length versus body heightDistance hiked versus hike durationGalaxy distances versus recession velocitiesSea surface height versus yearMass versus volume of solution
Jason VestutoSee More
Data Scientist, University of Texas at Austin
Jason Vestuto started life as a musician and later studied physics and taught himself to code to survive. Along the way, he has completed a couple of degrees in physics, and another in science education, and discovered that he learns best by trying to teach others. Presently, he works within the Space and Geophysics Lab of the University of Texas at Austin, as a python developer and data scientist focused on GPS satellite navigation and signal processing.