# Measures of Variability

| September 22nd, 2014

## Michael Jordan's first NBA season

Michael Jordan was one of the greatest basketball players of all time. Not only did he win six NBA titles with the Chicago Bulls, but he was selected as the Most Valuable Player (MVP) in five different seasons.

You want to explore the performances of "Air Jordan" in terms of points per game in his first season. You are especially interested in the variability of points scored over course of the season.

### Instructions

• `data_jordan`, which is available in your workspace, contains Michael Jordan's points per game in his first NBA season. Print the data to the console.
• Use the `plot()` function to make a scatterplot of the `points` column. Use the `main` argument to title the plot `"Michael Jordan's first season"`
• Calculate the `mean()` points per game and save the result to `mean_jordan`
• Add a horizontal line to the plot to show the mean points per game using the `abline()` function with one argument: `h` (for "horizontal"). For example, `abline(h = 7)` would create a horizontal line at the value 7.
`data_jordan = read.table("http://assets.datacamp.com/course/Conway/Lecture_Data/L15-L16_game_points_jordan.txt",header=T)` ```## The dataset `data_jordan` is already loaded # View Michael Jordan's first season data # Make a scatterplot of his points per game # Calculate mean_jordan # Add horizontal line with abline() ``` ```## The dataset `data_jordan` is already loaded # View Michael Jordan's first season data data_jordan # Make a scatterplot of his points per game plot(data_jordan\$points, main = "Michael Jordan's first season") # Calculate mean_jordan mean_jordan <- mean(data_jordan\$points) # Add horizontal line with abline() abline(h = mean_jordan)``` ``` test_output_contains("data_jordan", incorrect_msg = "Print <code>data_jordan</code> to the console. You can do this by just typing its name.") test_function("plot", args = c("x", "main")) test_object("mean_jordan", incorrect_msg = "Did you calculate the mean points per game and assign the result to <code>mean_jordan</code>?") test_function("abline", args = "h", incorrect_msg = "Draw the mean amount of points that Michael Jordan scored as a horizontal line on the scatterplot. You should use the function <code>abline</code> with the argument <code>h</code> for this. If you don't know how this function works, you can always type <code>?abline</code> to see the help file.") success_msg("Nice! Take a look at your plot. Do you understand the concept of deviation with respect to the mean? If so, move on to the next exercise.") ```

Call the plot function with two arguments: the column of data you wish to plot and the `main` argument with the appropriate title. Use the `\$` operator to select the `points` column from `data_jordan`.

## Calculate the variance manually

As a reminder, we use the following process to calculate the sample variance:

1. Calculate the sample mean
2. Calculate the squared difference between each data point and the sample mean
3. Sum these squared differences (i.e. compute the sum of squares)
4. Divide the sum of squares by \(N-1\) (i.e. the sample size minus 1)

Let's calculate the sample variance of Michael Jordan's points per game!

### Instructions

The dataset `data_jordan` is loaded into your workspace.

• Calculate the mean points per game and save the result to `mean_ppg`.
• Subtract the mean points per game from the vector of points scored in each game and assign the result to `diff`.
• Square this vector of differences and save to `squared_diff`.
• Calculate the sample variance by summing the values in `squared_diff` with `sum()` and dividing by the sample size minus 1 using `length()` to count the number of games in the sample. Just print the result without saving it.
• Check your result by calculating the variance with R's built-in `var()` function.
`data_jordan = read.table("http://assets.datacamp.com/course/Conway/Lecture_Data/L15-L16_game_points_jordan.txt",header=T)` ```## The dataset `data_jordan` is already loaded # Calculate mean points per game mean_ppg <- ___ # Calculate deviations from mean diff <- ___ # Calculate squared deviations squared_diff <- ___ # Combine everything to compute sample variance # Compare with the result of var() ``` ```## The dataset `data_jordan` is already loaded # Calculate mean points per game mean_ppg <- mean(data_jordan\$points) # Calculate deviations from mean diff <- data_jordan\$points - mean_ppg # Calculate squared deviations squared_diff <- diff^2 # Combine everything to compute sample variance sum(squared_diff) / (length(data_jordan\$points) - 1) # Compare with the result of var() var(data_jordan\$points)``` ``` test_object("mean_ppg", undefined_msg = "Save the mean points per game to a new variable called `mean_ppg`", incorrect_msg = "Use `mean(data_jordan\$points)` to compute the mean points per game and save the result to `mean_ppg`") test_object("diff", undefined_msg = "Subtract `mean_ppg` from `data_jordan\$points` and save the result to `diff`", incorrect_msg = "Subtract `mean_ppg` from `data_jordan\$points` and save the result to `diff`") test_object("squared_diff", undefined_msg = "Square `diff` with `diff^2` and save the result to `squared_diff`", incorrect_msg = "Square `diff` with `diff^2` and save the result to `squared_diff`") test_correct({ test_function("sum", not_called_msg = "Don't forget to use the `sum()` function when manually computing the sample variance!") test_function("length", not_called_msg = "Don't forget to use the `length()` function when manually computing the sample variance!") }, { test_output_contains("sum(squared_diff) / (length(data_jordan\$points) - 1)", incorrect_msg = "Sum the values in `squared_diff` with `sum()`, then divide by `length(data_jordan\$points) - 1`") }) test_student_typed("var(data_jordan\$points)", not_typed_msg = "Call `var()` with one argument: the `points` column of `data_jordan`") test_error() success_msg("Great work!") ```
• Calculate the mean points per game using the `mean()` function.
• You can selecting a column from a dataframe by using the `\$` operator.
• Use the `^` operator to square every element in a vector. For example, `c(1, 2, 3)^2` will result in `c(1, 4, 9)`.
• When you calculate the sample variance, do not forget to divide by `length(data) - 1` instead of `length(data)`.