Data Analyst with R
About this Workspace
This workspace will register the required knowledge to apply for the Data Analyst Professional Certification.
# Import packages
library(dplyr)1. Importing flat files using Utils (embedded library)
1.1 Importing .csv files
# Importing a simple .csv file using Utils
pools <- read.csv("swimming_pools.csv")
str(pools)
In the below code we will be coercing the string data to NOT be loaded as Factors.
In the read.csv() function we have an argument called stringsAsFactors.
The default of this argument is set as TRUE. To avoid loading string columns as factors, we should coerce the argument to FALSE.
Tip: It makes sense to use TRUE value in the argument if the string columns you are importing are categorical values.
# Import swimming_pools.csv correctly: pools
pools <- read.csv("swimming_pools.csv", stringsAsFactors = FALSE)
# Check the structure of pools
str(pools)1.2 Importing .txt files
# Import hotdogs.txt: hotdogs
hotdogs <- read.delim(file = "hotdogs.txt",
header = FALSE,
sep = "\t",
stringsAsFactors = FALSE)
# Summarize hotdogs
summary(hotdogs)In the above code we imported a .txt file delimited by tabs. The file contained data without column names, that is why we loaded with 'header' argument as FALSE.
You can see that we used another Utils function called summary(). This function brought summarized statistics about the variables we have in the dataset.
2. Importing flat files using Readr Library
2.1 Importing .csv files
# Load the readr package
library(readr)
# Import potatoes.csv with read_csv(): potatoes
potatoes <- read_csv("potatoes.csv")2.2 Importing .txt files
# Import potatoes.txt using read_tsv
# Column names
properties <- c("area", "temp", "size", "storage", "method",
"texture", "flavor", "moistness")
# Import potatoes.txt: potatoes
potatoes <- read_tsv("potatoes.txt", col_names = properties)
# Call head() on potatoes
head(potatoes)