this is the nav!
Introduction to Statistics in R
• AI Chat
• Code
• Report
• ## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Introduction to Statistics in R

Run the hidden code cell below to import the data used in this course.

```.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}```# When wanting to use different quantiles
quantile(data\$variable, probs = seq(from, to, by))
seq(0, 1, 0.2 ) # From 0 to 1 with steps of 0.2

# Calculate variance and sd of co2_emission for each food_category
food_consumption %>%
group_by(food_category) %>%
summarize(var_co2 = var(co2_emission),
sd_co2 = sd(co2_emission))

# Create subgraphs for each food_category: histogram of co2_emission
ggplot(food_consumption, aes(co2_emission)) +
# Create a histogram
geom_histogram() +
# Create a separate sub-graph for each food_category
facet_wrap(~ food_category)``````

A probability distribution describes the probability of each possible outcome in a scenario. The expected value is the mean of the probability distribution.

``````# Distributions and calculating probabilities
punif # uniform
pnorm # normal
pbinom # binomial
ppois # poisson
pexp # exponential``````
``````# Min and max wait times for back-up that happens every 30 min
min <- 0
max <- 30

# Calculate probability of waiting 10-20 mins
prob_between_10_and_20 <- punif(20, min, max) - punif(10, min, max)
prob_between_10_and_20``````
``````# Set random seed to 334
set.seed(334)

# Generate 1000 wait times between 0 and 30 mins, save in time column
wait_times %>%
mutate(time = runif(1000, min = 0, max = 30)) %>%
# Create a histogram of simulated times
ggplot(aes(time)) +
geom_histogram(bins = 30)``````
``````# Probability of deal < 7500
pnorm(7500, 5000, 2000, lower.tail = TRUE)

# Calculate new average amount
new_mean <- 5000 * 1.2

# Calculate new standard deviation
new_sd <- 2000 * 1.3

# Simulate 36 sales
new_sales <- new_sales %>%
mutate(amount = rnorm(36, new_mean, new_sd))

# Create histogram with 10 bins
ggplot(new_sales, aes(amount)) +
geom_histogram(bins = 10) # 10 bars

# Take 30 samples of 20 values of num_users, take mean of each sample
sample_means <- replicate(30, sample(all_deals\$num_users, 20) %>% mean())
``````

The central limit theorem states that a sampling distribution of a sample statistic approaches the normal distribution as you take more samples, no matter the original distribution being sampled from.