course
SQLite in R
As mentioned at the end of my previous tutorial, Beginners Guide to SQLite, SQLite databases are most powerful when used along R and Python. So far in DataCamp, you have been exposed to how to operate with SQLite databases from Python (see the SQLite in Python tutorial by Sayak Paul to learn how to manipulate SQLite databases through the sqlite3 package in Python). However, in this tutorial, we will focus on how to use SQLite databases in R using the RSQLite package.
We will go over the basics of how to perform essential tasks such as sending queries to a SQLite database or creating tables using RSQLite. Furthermore, I will cover how to use parameterized queries and operations like INSERT or DELETE that do not return tabular results.
Creating Databases and Tables
The first step as you may have guessed is to create a database. RSQLite can create ephemeral in-memory transient SQLite databases just as it happens when you open the SQLite command line. However, this is not usually what you want, so let's create a proper database for the mtcars dataset using the function dbConnect(), which takes the following arguments:
- drv: A database driver
- path: The path to a SQLite database. If you are creating a new one, simply give it a name of your choice as I do below. If you want to operate with a transient in-memory database; however, you can omit the path argument or type ":memory:").
# Load the RSQLite Library
library(RSQLite)
# Load the mtcars as an R data frame put the row names as a column, and print the header.
data("mtcars")
mtcars$car_names <- rownames(mtcars)
rownames(mtcars) <- c()
head(mtcars)
# Create a connection to our new database, CarsDB.db
# you can check that the .db file has been created on your working directory
conn <- dbConnect(RSQLite::SQLite(), "CarsDB.db")
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | car_names |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | Mazda RX4 |
21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | Mazda RX4 Wag |
22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | Datsun 710 |
21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | Hornet 4 Drive |
18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | Hornet Sportabout |
18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | Valiant |
Once you have the database created and your data in proper shape, you can go ahead and create a table inside the database using the dbWriteTable() function. This function can take multiple arguments, but, for now, let's focus on the following:
- conn: The connection to your SQLite database
- name: The name you want to use for your table
- value: The data that you want to insert. It should be an R data frame or an object coercible to an R data frame.
After that, you can use the function dbListTables() with the SQLite database connection as an argument to check that you have successfully created the table.
# Write the mtcars dataset into a table names mtcars_data
dbWriteTable(conn, "cars_data", mtcars)
# List all the tables available in the database
dbListTables(conn)
'cars_data'
An extremely useful feature when creating tables using RSQLite, is that you can append more data to an existing table using a loop in case that you have several data frames by setting the optional argument append = TRUE inside dbWriteTable(). For instance, let us create a new toy table with some cars and the manufacturers by appending two different data frames:
# Create toy data frames
car <- c('Camaro', 'California', 'Mustang', 'Explorer')
make <- c('Chevrolet','Ferrari','Ford','Ford')
df1 <- data.frame(car,make)
car <- c('Corolla', 'Lancer', 'Sportage', 'XE')
make <- c('Toyota','Mitsubishi','Kia','Jaguar')
df2 <- data.frame(car,make)
# Add them to a list
dfList <- list(df1,df2)
# Write a table by appending the data frames inside the list
for(k in 1:length(dfList)){
dbWriteTable(conn,"Cars_and_Makes", dfList[[k]], append = TRUE)
}
# List all the Tables
dbListTables(conn)
- 'Cars_and_Makes'
- 'cars_data'
Let us now make sure that all the data is in the new table:
dbGetQuery(conn, "SELECT * FROM Cars_and_Makes")
car | make |
---|---|
Camaro | Chevrolet |
California | Ferrari |
Mustang | Ford |
Explorer | Ford |
Corolla | Toyota |
Lancer | Mitsubishi |
Sportage | Kia |
XE | Jaguar |
Executing SQL Queries
As you have seen above, it is possible to execute valid SQL queries through RSQLite using the dbGetQuery() function, which has the following arguments:
- conn: The connection to the SQLite Database
- query: The SQL query you want to execute as a string
To further showcase the capability to execute SQL queries with RSQLite, let's go over some more query examples on the cars_data table:
Note: It is important to keep in mind that, through RSQLIte, you can execute any query that is valid for a SQLite database from simple SELECT statements to JOINS (Except RIGHT OUTER JOINS and FULL OUTER JOINS, which are not allowed in SQLite).
# Gather the first 10 rows in the cars_data table
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | car_names |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | Mazda RX4 |
21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | Mazda RX4 Wag |
22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | Datsun 710 |
21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | Hornet 4 Drive |
18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | Hornet Sportabout |
18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | Valiant |
14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 | Duster 360 |
24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 | Merc 240D |
22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 | Merc 230 |
19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 | Merc 280 |
# Get the car names and horsepower of the cars with 8 cylinders
dbGetQuery(conn,"SELECT car_names, hp, cyl FROM cars_data
WHERE cyl = 8")
car_names | hp | cyl |
---|---|---|
Hornet Sportabout | 175 | 8 |
Duster 360 | 245 | 8 |
Merc 450SE | 180 | 8 |
Merc 450SL | 180 | 8 |
Merc 450SLC | 180 | 8 |
Cadillac Fleetwood | 205 | 8 |
Lincoln Continental | 215 | 8 |
Chrysler Imperial | 230 | 8 |
Dodge Challenger | 150 | 8 |
AMC Javelin | 150 | 8 |
Camaro Z28 | 245 | 8 |
Pontiac Firebird | 175 | 8 |
Ford Pantera L | 264 | 8 |
Maserati Bora | 335 | 8 |
# Get the car names and horsepower starting with M that have 6 or 8 cylinders
dbGetQuery(conn,"SELECT car_names, hp, cyl FROM cars_data
WHERE car_names LIKE 'M%' AND cyl IN (6,8)")
car_names | hp | cyl |
---|---|---|
Mazda RX4 | 110 | 6 |
Mazda RX4 Wag | 110 | 6 |
Merc 280 | 123 | 6 |
Merc 280C | 123 | 6 |
Merc 450SE | 180 | 8 |
Merc 450SL | 180 | 8 |
Merc 450SLC | 180 | 8 |
Maserati Bora | 335 | 8 |
# Get the average horsepower and mpg by number of cylinder groups
dbGetQuery(conn,"SELECT cyl, AVG(hp) AS 'average_hp', AVG(mpg) AS 'average_mpg' FROM cars_data
GROUP BY cyl
ORDER BY average_hp")
cyl | average_hp | average_mpg |
---|---|---|
4 | 82.63636 | 26.66364 |
6 | 122.28571 | 19.74286 |
8 | 209.21429 | 15.10000 |
To store the results of your queries to do further operations in R down the line as a data frame, it is as simple as assigning the result of your query to a variable.
avg_HpCyl <- dbGetQuery(conn,"SELECT cyl, AVG(hp) AS 'average_hp'FROM cars_data
GROUP BY cyl
ORDER BY average_hp")
avg_HpCyl
class(avg_HpCyl)
cyl | average_hp |
---|---|
4 | 82.63636 |
6 | 122.28571 |
8 | 209.21429 |
'data.frame'
Insert Variables into Queries (Parameterised Queries)
One of the greatest advantages of operating with SQLite databases from R is the ability to use parameterized queries. That is, the ability to gather variables available in your R workspace and use them to query your SQLite database. Let me show an example of how to use a variable inside a SQLite query:
# Lets assume that there is some user input that asks us to look only into cars that have over 18 miles per gallon (mpg)
# and more than 6 cylinders
mpg <- 18
cyl <- 6
Result <- dbGetQuery(conn, 'SELECT car_names, mpg, cyl FROM cars_data WHERE mpg >= ? AND cyl >= ?', params = c(mpg,cyl))
Result
car_names | mpg | cyl |
---|---|---|
Mazda RX4 | 21.0 | 6 |
Mazda RX4 Wag | 21.0 | 6 |
Hornet 4 Drive | 21.4 | 6 |
Hornet Sportabout | 18.7 | 8 |
Valiant | 18.1 | 6 |
Merc 280 | 19.2 | 6 |
Pontiac Firebird | 19.2 | 8 |
Ferrari Dino | 19.7 | 6 |
As you can see the only difference between sending a normal query and a parameterized query lies in the placeholder value in the query (>= ?) and the params argument of dbGetQuery(), which takes a list or a vector with the values you want to assign to the placeholders (in this case we have a vector containing the mpg and cyl variables).
Now, what happens when you want to execute a different query? In the example above, as you can see, I pretty much hand-crafted the query. It just allows the input of mpg and cyl values, and the query will only retrieve the cars with higher or equal mpg and cyl. However, there may be situations where you may want to be more flexible. What happens if a user wants to look at the cars that have the higher or equal horsepower and weight than a specific value as well? With the query above, you would have to go back and rewrite it, but you could write a function that would make that step unnecessary. Let's take a look at one:
# Assemble an example function that takes the SQLite database connection, a base query,
# and the parameters you want to use in the WHERE clause as a list
assembleQuery <- function(conn, base, search_parameters){
parameter_names <- names(search_parameters)
partial_queries <- ""
# Iterate over all the parameters to assemble the query
for(k in 1:length(parameter_names)){
filter_k <- paste(parameter_names[k], " >= ? ")
# If there is more than 1 parameter, add an AND statement before the parameter name and placeholder
if(k > 1){
filter_k <- paste("AND ", parameter_names[k], " >= ?")
}
partial_queries <- paste(partial_queries, filter_k)
}
# Paste all together into a single query using a WHERE statement
final_paste <- paste(base, " WHERE", partial_queries)
# Print the assembled query to show how it looks like
print(final_paste)
# Run the final query. I unlist the values from the search_parameters list into a vector since it is needed
# when using various anonymous placeholders (i.e. >= ?)
values <- unlist(search_parameters, use.names = FALSE)
result <- dbGetQuery(conn, final_paste, params = values)
# return the executed query
return(result)
}
base <- "SELECT car_names, mpg, hp, wt FROM cars_data"
search_parameters <- list("mpg" = 16, "hp" = 150, "wt" = 2.1)
result <- assembleQuery(conn, base, search_parameters)
result
[1] "SELECT car_names, mpg, hp, wt FROM cars_data WHERE mpg >= ? AND hp >= ? AND wt >= ?"
car_names | mpg | hp | wt |
---|---|---|---|
Hornet Sportabout | 18.7 | 175 | 3.440 |
Merc 450SE | 16.4 | 180 | 4.070 |
Merc 450SL | 17.3 | 180 | 3.730 |
Pontiac Firebird | 19.2 | 175 | 3.845 |
Ferrari Dino | 19.7 | 175 | 2.770 |
The function above may be basic, but it illustrates how it is possible to write R code that generates SQL queries that can be executed on a SQLite database. I encourage you to keep exploring if you are interested in this use case. The function above can be enhanced in several ways starting with the removal of the assumption that the user providing the parameters will want to see the values that are higher or equal than those inputted.
Statements that do not Return Tabular Results
Sometimes, you may want to execute SQL queries that do not necessarily return tabular data. Examples of these operations are inserting, updating or deleting table records. For this, we can use the function dbExecute(), which takes as arguments a SQLite database connection and a SQL query. A couple of examples can be seen below:
# Visualize the table before deletion
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")
# Delete the column belonging to the Mazda RX4. You will see a 1 as the output.
dbExecute(conn, "DELETE FROM cars_data WHERE car_names = 'Mazda RX4'")
# Visualize the new table after deletion
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | car_names |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | Mazda RX4 |
21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | Mazda RX4 Wag |
22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | Datsun 710 |
21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | Hornet 4 Drive |
18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | Hornet Sportabout |
18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | Valiant |
14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 | Duster 360 |
24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 | Merc 240D |
22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 | Merc 230 |
19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 | Merc 280 |
1
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | car_names |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | Mazda RX4 Wag |
22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | Datsun 710 |
21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | Hornet 4 Drive |
18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | Hornet Sportabout |
18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | Valiant |
14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 | Duster 360 |
24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 | Merc 240D |
22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 | Merc 230 |
19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 | Merc 280 |
17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 | Merc 280C |
Now lets try inserting the data of the Mazda RX4 back:
# Insert the data for the Mazda RX4. This will also ouput a 1
dbExecute(conn, "INSERT INTO cars_data VALUES (21.0,6,160.0,110,3.90,2.620,16.46,0,1,4,4,'Mazda RX4')")
# See that we re-introduced the Mazda RX4 succesfully at the end
dbGetQuery(conn, "SELECT * FROM cars_data")
1
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | car_names |
---|---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | Mazda RX4 Wag |
22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | Datsun 710 |
21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | Hornet 4 Drive |
18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | Hornet Sportabout |
18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | Valiant |
14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 | Duster 360 |
24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 | Merc 240D |
22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 | Merc 230 |
19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 | Merc 280 |
17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 | Merc 280C |
16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 | Merc 450SE |
17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 | Merc 450SL |
15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 | Merc 450SLC |
10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 | Cadillac Fleetwood |
10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 | Lincoln Continental |
14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 | Chrysler Imperial |
32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 | Fiat 128 |
30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 | Honda Civic |
33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 | Toyota Corolla |
21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 | Toyota Corona |
15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 | Dodge Challenger |
15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 | AMC Javelin |
13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 | Camaro Z28 |
19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 | Pontiac Firebird |
27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 | Fiat X1-9 |
26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 | Porsche 914-2 |
30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 | Lotus Europa |
15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 | Ford Pantera L |
19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 | Ferrari Dino |
15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 | Maserati Bora |
21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 | Volvo 142E |
21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | Mazda RX4 |
As you can see, the last line of code added the Mazda RX4 to the end of the table. Once you are done operating with your SQLite database within R, it is important to call the function dbDisconnect(). This ensures that we release any resources that the database connection has been using, which is always a good practice.
# Close the database connection to CarsDB
dbDisconnect(conn)
Conclusion
In this tutorial, we have covered the essential functions needed to manipulate SQLite databases in R using RSQLite. SQLite databases, if appropriately used, are an incredibly useful tool when combined inside your R scripts. An example of the power of the intersection between R and SQLite can be seen in parameterized queries, which could be used if you need to query a database in order to display information based on user input inside an R Shiny app. Another example use case or parameterized queries could be virtual assistants or chatbots. If you want to learn more about this, I encourage you to look into DataCamp's Building Chatbots with Python course.
Writing SQLite tables by appending data frames, can also be very powerful. As I noted in the Beginners Guide to SQLite tutorial, this capability allowed me to complete a social network analysis project by appending the followers of several Twitter users of interest to a table as I gathered them. I used a SQLite database for this, to save time by avoiding the need to start from scratch in the case of a power outage or Windows update, which could forcefully shut down my computer. If something like that happened, all that I had to do was restart gathering the followers of the user that was next to the last one that I had in the database. To put it into perspective, the computation time to gather the followers of all my Twitter users of interest, took nearly 4 weeks to complete due to the rate limits of the Twitter API. Therefore, starting from zero due to a power outage or Windows update would have been very undesirable.
As always, I encourage you to keep learning about SQLite databases and how to interact with them through R. I strongly encourage you to dive deeper an check the previous tutorials covering their usage with Python and the command line, which can further help you become a full SQLite wiz. Keep learning; the sky is the limit!
R Courses
course
Intermediate R
course
Introduction to Importing Data in R
blog
R vs SQL - Which Should I Learn?
tutorial
How to Use SQLite in Python
tutorial
Beginners Guide to SQLite
tutorial
SQL Database Overview Tutorial
DataCamp Team
3 min
tutorial
How to Execute SQL Queries in Python and R Tutorial
tutorial