Skip to content

Amazon S3 Bucket Connection

This recipe shows you how to connect to AWS S3 bucket. Using the Integrations tab, you can make this connection without exposing sensitive data; you can learn more about integrations here.

As an example, you can connect your workspace to a fictional online ticket sale dataset. The files for this dataset are stored in a S3 bucket that is hosted on a DataCamp server. The ER diagram of this sample database is shown in the appendix of this recipe. The right set of environment variables to connect to this sample database can be found in this section of the documentation.

Once you are familiar with this example, you can also connect to your own Amazon S3 bucket by inserting your own credentials as the environment variables.

Load packages

suppressPackageStartupMessages(library(tidyverse))
library(aws.s3)

Print out the files available in the workspacedemos3 directory.

AWS_BUCKET_NAME <- Sys.getenv("AWS_BUCKET_NAME")
files <- get_bucket_df(AWS_BUCKET_NAME, prefix = "workspacedemos3/")[["Key"]]

Define a function to load the data from the files into data frame.

read_data_to_df <- function(file) {
      object <- get_object(bucket = AWS_BUCKET_NAME, object = file)
      raw <- rawConnection(object)
      df = read_delim(raw, "|", col_names = FALSE)
      return(df)
  } 

Convert the files to a dataframe

categories <- read_data_to_df(files[[1]][1])
dates <- read_data_to_df(files[[2]][1])
events <- read_data_to_df(files[[1]][1])
listings <- read_data_to_df(files[[1]][1])
sales <- read_data_to_df(files[[1]][1])
categories <- read_data_to_df(files[[1]][1])

Appendix

1. ER diagram of online ticket sales database

This ER diagram contains information about all the tables in the sample database, and shows how they relate to each other. (database and ER diagram - source).

Spinner
Data frameas
df
variable