Skip to content
Introduction to Statistics in R
  • AI Chat
  • Code
  • Report
  • Spinner

    Introduction to Statistics in R

    Run the hidden code cell below to import the data used in this course.

    # When wanting to use different quantiles
    quantile(data$variable, probs = seq(from, to, by))
    seq(0, 1, 0.2 ) # From 0 to 1 with steps of 0.2 
    
    # Calculate variance and sd of co2_emission for each food_category
    food_consumption %>% 
      group_by(food_category) %>% 
      summarize(var_co2 = var(co2_emission),
               sd_co2 = sd(co2_emission)) 
    
    # Create subgraphs for each food_category: histogram of co2_emission
     ggplot(food_consumption, aes(co2_emission)) +
      # Create a histogram
      geom_histogram() +
      # Create a separate sub-graph for each food_category
      facet_wrap(~ food_category)

    A probability distribution describes the probability of each possible outcome in a scenario. The expected value is the mean of the probability distribution.

    # Distributions and calculating probabilities
    punif # uniform
    pnorm # normal
    pbinom # binomial
    ppois # poisson
    pexp # exponential
    # Min and max wait times for back-up that happens every 30 min
    min <- 0
    max <- 30
    
    # Calculate probability of waiting 10-20 mins
    prob_between_10_and_20 <- punif(20, min, max) - punif(10, min, max)
    prob_between_10_and_20
    # Set random seed to 334
    set.seed(334)
    
    # Generate 1000 wait times between 0 and 30 mins, save in time column
    wait_times %>%
      mutate(time = runif(1000, min = 0, max = 30)) %>%
      # Create a histogram of simulated times
      ggplot(aes(time)) +
      geom_histogram(bins = 30)
    # Probability of deal < 7500
    pnorm(7500, 5000, 2000, lower.tail = TRUE)
    
    # Calculate new average amount
    new_mean <- 5000 * 1.2
    
    # Calculate new standard deviation
    new_sd <- 2000 * 1.3
    
    # Simulate 36 sales
    new_sales <- new_sales %>% 
      mutate(amount = rnorm(36, new_mean, new_sd))
     
    # Create histogram with 10 bins
    ggplot(new_sales, aes(amount)) +
    geom_histogram(bins = 10) # 10 bars
    
    # Take 30 samples of 20 values of num_users, take mean of each sample
    sample_means <- replicate(30, sample(all_deals$num_users, 20) %>% mean())
    

    The central limit theorem states that a sampling distribution of a sample statistic approaches the normal distribution as you take more samples, no matter the original distribution being sampled from.