# Measures of Central Tendency

| September 22nd, 2014

## The mean of a Fibonacci sequence

In the last video, we use the wine ratings example to illustrate how means can be calculated in practice. Here, you will use the numbers of the Fibonacci sequence. Remember that this sequence starts with 0 and 1. Each subsequent number is the sum of the two previous numbers.

### Instructions

• Create a new vector fibonacci that lists the first eight numbers of the Fibonacci sequence in order.
• Calculate the mean of fibonacci manually, using sum() to add up the numbers in the vector and length() to count how many numbers are in the vector (N).
• Calculate the mean of fibonacci using the mean() function.
# Create a vector that contains the Fibonacci elements fibonacci <- ___ # Calculate the mean manually using sum() and length() mean_manual <- ___ # Calculate the mean the easy way mean_check <- ___ # Create a vector that contains the Fibonacci elements fibonacci <- c(0, 1, 1, 2, 3, 5, 8, 13) # Calculate the mean manually using sum() and length() mean_manual <- sum(fibonacci) / length(fibonacci) # Calculate the mean the easy way mean_check <- mean(fibonacci) test_object("fibonacci", incorrect_msg = "You did not correctly assign the first 8 elements of the Fibonacci sequence to <code>fibonacci</code>. If you really don't know how to do this, read the hint.") test_function("sum", not_called_msg = "It seems that you did not use the <code>sum()</code> function to calculate the sum of <code>fibonacci</code>. If you don't know how this function works, you can always type <code>?sum</code> to get the help file.") test_function("length", not_called_msg = "It seems that you did not use the <code>length()</code> function to get the amount of values in <code>fibonacci</code>. If you don't know how this function works, you can always type <code>?length</code> to get the help file.") test_object("mean_manual", incorrect_msg = "It seems like you made a mistake calculating the mean manually. Did you divide the sum of <code>fibonacci</code> by its length?") test_function("mean", args = "x", incorrect_msg = "You have not yet calculated the mean of <code>fibonacci</code> the easy way. You should use the <code>mean()</code> function for this. If you don't know how this function works, you can always type <code>?mean</code> to get the help file.") success_msg("Well done! For the remainder of the course you can use the easy way for calculating means in R.")

The set of the first eight Fibonacci numbers is the following: 0, 1, 1, 2, 3, 5, 8, 13. Recall that you have to use c() to create a vector.

## Setting up histograms

Can you reproduce the histograms showing the distributions of ratings for the Australian wines? As you saw in the video, the distributions for the Shiraz (red) and the Pinot Grigio (white) should look different.

### Instructions

• A data frame containing the Australian wine ratings, wine_data, has been loaded into your workspace. Take a look by printing it to the console.
• Create two subsets red_wine and white_wine using the subset() function.
• Plot a histogram of the ratings for each subset. Label both x-axes with "Ratings" and title each histogram according to the type of wine displayed: "Shiraz" or "Pinot Grigio".
wine_data <- read.table("http://assets.datacamp.com/course/Conway/Lecture_Data/L15-16_Wine_Australia_Red_White.txt", dec = ",", header=T) # Look at the data in order to figure out which subsets are included # Create red_wine and white_wine red_wine <- ___ white_wine <- ___ # Plot the histograms of the ratings of both subsets par(mfrow = c(1,2)) # Print the data in order to figure out which subsets are included wine_data # Create red_wine and white_wine red_wine <- subset(wine_data, wine_data[, 1] == "Red") white_wine <- subset(wine_data, wine_data[, 1] == "White") # Plot the histograms of the ratings of both subsets par(mfrow = c(1,2)) hist(red_wine\$Ratings, xlab = "Ratings", main = "Shiraz") hist(white_wine\$Ratings, xlab = "Ratings", main = "Pinot Grigio") test_output_contains("wine_data", incorrect_msg = "It looks like you did not print <code>wine_data</code>. You should print this data frame in order to see in what subsets you will have to split it up.") msg <- "You should split up the <code>wine_data</code> data frame in two subsets. One with <code>condition</code> \"Red\" and one with <code>condition</code> \"White\". Assign these subsets to the variables that are called <code>red_wine</code> and <code>white_wine</code>, respectively." test_object("red_wine", incorrect_msg = msg) test_object("white_wine", incorrect_msg = msg) test_function("hist", "x", index = 1, incorrect_msg = "Make sure that you have plotted the <code>red_wine</code> histogram using the <code>hist()</code> function. You should use the variable <code>Ratings</code> of the <code>red_wine</code> data. Recall that you can always type <code>?hist</code> to get the help file.") test_function("hist", "main", index = 1, incorrect_msg = "Make sure that you create the correct title. For the <code>red_wine</code> histogram this should be <code>\"Shiraz\"</code>.") test_function("hist", c("x","main","xlab"), index = 1, incorrect_msg = "Make sure that you have plotted the <code>red_wine</code> histogram with the correct axis-labels.") test_function("hist", "x", index = 2, incorrect_msg = "Make sure that you have plotted the <code>white_wine</code> histogram using the <code>hist()</code> function. You should use the variable <code>Ratings</code> of the <code>white_wine</code> data. Recall that you can always type <code>?hist</code> to get the help file.") test_function("hist", "main", index = 2, incorrect_msg = "Make sure that you create the correct title. For the <code>white_wine</code> histogram this should be <code>\"Pinot Grigio\"</code>.") test_function("hist", c("x","main","xlab"), index = 2, incorrect_msg = "Make sure that you have plotted the <code>white_wine</code> histogram with the correct axis-labels.") success_msg("Perfect! Continue to the next exercise.")

For example, to plot a histogram of the Shiraz ratings, use

hist(red_wine\$Ratings, xlab = "Ratings", main = "Shiraz")

## Robustness to outliers

Measures of central tendency attempt to describe the middle or center point of a distribution. In the presence of outliers, or extreme values, the median is preferred over the mean. The reason for this is that the mean can be "dragged" up or down by extreme values, but since the median is just the middle value in a distribution, it is not influenced by the outliers.

A person who does not like wine at all enters the wine ratings survey and makes a statement by giving the Shiraz the lowest possible score of zero. Let's see how it affects the mean and median of the score distribution.

### Instructions

We've made available to you both the original red_wine ratings as well as red_wine_extreme, which contains the original ratings plus the new extreme rating.

• Calculate the change in mean rating after adding the new extreme value. Use the mean() function and save the result to diff_mean.
• Calculate the change in median rating after adding the new extreme value. Use the median() function and save the result to diff_median.
• Print both differences to see which measure of central tendency is least affected by the addition of the extreme rating.
wine_data <- read.table("http://assets.datacamp.com/course/Conway/Lecture_Data/L15-16_Wine_Australia_Red_White.txt",dec = "," ,header=T) red_wine <- subset(wine_data, wine_data\$condition == "Red") red_wine_extreme <- rbind(red_wine, data.frame(condition = "Red", Ratings = 0)) # Calculate the change in mean diff_mean <- ___ # Calculate the change in median diff_median <- ___ # Print both differences # Calculate the change in mean diff_mean <- mean(red_wine_extreme\$Ratings) - mean(red_wine\$Ratings) # Calculate the change in median diff_median <- median(red_wine_extreme\$Ratings) - median(red_wine\$Ratings) # Print both differences diff_mean diff_median test_object("diff_mean", undefined_msg = "Define `diff_mean` as the difference between the old mean and new mean.", incorrect_msg = "Subtract the old mean rating from the new mean rating.") test_object("diff_median", undefined_msg = "Define `diff_median` as the difference between the old median and new median.", incorrect_msg = "Subtract the old median rating from the new median rating.") test_output_contains("diff_mean") test_output_contains("diff_median") test_error() success_msg("Great work!")

Do not forget to reference the Ratings column of the data frame in order to calculate your means and medians. For example, mean(red_wine_extreme\$Ratings). Subtract the old measure from the new measure in both cases.