Skip to content

Course Notes

Use this workspace to take notes, store sample queries, and build your own interactive cheat sheet!

You will need to connect your SQL cells to an integration to run a query.

  • You can use a sample integration from the dropdown menu. This includes the Course Databases integration, which contains tables you used in our SQL courses.
  • You can connect your own integration by following the instructions provided here.

Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.).

Take Notes

Add notes here about the concepts you've learned and SQL cells with code you want to keep.

Add your notes here

Spinner
DataFrameas
df
variable
-- A sample query for you to replace!
SELECT 
    *
FROM books
Spinner
DataFrameas
df2
variable
Spinner
DataFrameas
df1
variable
Run cancelled
SELECT title
FROM "films - Copy.csv"
Run cancelled
# create a list of height of 20 individuals
heights = [5.6, 6.2, 5.9, 5.4, 6.1, 5.8, 5.7, 6.0, 5.5, 6.3, 5.9, 6.2, 5.6, 5.8, 6.1, 5.7, 5.4, 6.0, 5.5, 6.3]
Run cancelled
# create a list of weight of 20 individuals
weights = [65.2, 70.5, 68.9, 72.1, 75.3, 68.0, 71.2, 73.4, 69.8, 74.5, 70.1, 72.9, 67.5, 69.3, 73.1, 70.8, 72.6, 68.4, 71.7, 74.2]
Run cancelled
import pandas as pd

songs = pd.read_csv("nigerian_spotify_songs1.csv")
Run cancelled
songs
Run cancelled
import matplotlib.pyplot as plt

# group the songs by artist_top_genre and calculate the mean popularity for each genre
genre_popularity = songs.groupby("artist_top_genre")["popularity"].mean()

# create a bar plot of the mean popularity for each genre
plt.bar(genre_popularity.index, genre_popularity.values)

# set the x-axis label
plt.xlabel("Artist Top Genre")

# set the y-axis label
plt.ylabel("Popularity")

# set the title of the plot
plt.title("Mean Popularity by Artist Top Genre")

# rotate the x-axis labels to avoid overlapping
plt.xticks(rotation=90)

# display the plot
plt.show()
Run cancelled
import pandas as pd

energy_data = pd.read_csv("energydata_complete.csv")

energy_data.head()
Run cancelled
from sklearn.linear_model import LinearRegression
import numpy as np

# select the relevant columns for the linear regression
X = energy_data[["T2"]]
y = energy_data["T6"]

# create a linear regression model
model = LinearRegression()

# fit the model to the data
model.fit(X, y)

# calculate the R^2 value and round to 2 decimal places
r_squared = round(model.score(X, y), 2)

# print the R^2 value
r_squared
Run cancelled
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# remove the specified columns from the dataframe
energy_data = energy_data.drop(["date", "lights"], axis=1)

# create the MinMaxScaler object
scaler = MinMaxScaler()

# normalize the dataset
energy_data_norm = pd.DataFrame(scaler.fit_transform(energy_data), columns=energy_data.columns)

# set the target variable
y = energy_data_norm["Appliances"]

# set the predictor variables
X = energy_data_norm.drop("Appliances", axis=1)

# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# create a linear regression model
model = LinearRegression()

# fit the model to the training data
model.fit(X_train, y_train)

# evaluate the model on the test data
score = model.score(X_test, y_test)

# print the R^2 value
score
Run cancelled
from sklearn.metrics import mean_absolute_error

# make predictions on the test set
y_pred = model.predict(X_test)

# calculate the mean absolute error and round to 2 decimal places
mae = round(mean_absolute_error(y_test, y_pred), 2)

# display the mean absolute error
mae