Skip to content

Introduction to Data Science in Python

Run the hidden code cell below to import the data used in this course.

# Importing pandas and numpy
import numpy as np
import pandas as pd

# Importing the course datasets
frequencies = pd.read_csv("datasets/all_frequencies.csv")
records = pd.read_csv("datasets/cell_phone_records.csv")
credit = pd.read_csv("datasets/credit_records.csv")
ransom = pd.read_csv("datasets/ransom.csv")
gravel = pd.read_csv("datasets/shoe_gravel_sample.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Matplotlib code

Creating a line chart

# Importing the module
from matplotlib import pyplot as plt
# or,
# import matplotlib.pyplot as plt

# Creating a line chart
plt.plot(ransom.letter, ransom.frequency, label='Frequency', color='teal')

# Adding axis and title label
plt.xlabel('Letter') # x-axis label
plt.ylabel('Frequency') # y-axis label
plt.title('Ransom Note Letters', fontsize=16) # plot title

# Set the y-ticks value
plt.yticks([0, 4, 8, 12]) 

# Format x-ticks scale
# plt.xscale('log') 

# Adding text on the plot
plt.text(5, 9, 'Unusually low H frequency!')

# Adding legend
plt.legend()

# Showing the plot on a new window
plt.show()

Styling for matplotlib

# Changing color of line with color argument
# plt.plot(ransom.letter, ransom.frequency, label='Frequency', color='teal')

# Changing line width with linewidth argument
# plt.plot(ransom.letter, ransom.frequency, label='Frequency', color='teal', linewidth = 10)

# Changing line style with linestyle argument, '-', '--', '-.', ':'
plt.plot(ransom.letter, ransom.frequency, label='Frequency', color='teal', linestyle = '--')

# Adding markers with marker argument, 'x', 's', 'o', 'd', '*', 'h'
plt.plot(ransom.letter, ransom.frequency, label='Frequency', color='teal', marker = 'x')

# Showing the plot
plt.show()

Histogram

plt.hist(gravel.radius,
        bins=40,
        range=(0, 10),
        density=True)

plt.show()

Dictionaries