Skip to content
Introduction to Data Science in Python
Run the hidden code cell below to import the data used in this course.
# Importing pandas and numpy
import numpy as np
import pandas as pd
# Importing the course datasets
frequencies = pd.read_csv("datasets/all_frequencies.csv")
records = pd.read_csv("datasets/cell_phone_records.csv")
credit = pd.read_csv("datasets/credit_records.csv")
ransom = pd.read_csv("datasets/ransom.csv")
gravel = pd.read_csv("datasets/shoe_gravel_sample.csv")
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
import numpy as np #np is an alias which represents numpy
# pandas module is used for creating a dataframe and reads comma seperated files.
# Taking a look into a dataframe, we should use the following functions:
frequencies.head() # head is used for to check the first few lines of the dataframe. First 5 lines.
frequencies.tail() # tail is used for to check the last few lines of the dataframe. Last 5 lines.
frequencies.info() # info function gives us an information about data types, column names and range of the DataFrame.
frequencies.shape # shape gives us the shapes of the dataframe. For example, (22,33) means the dataframe has 22 rows and 33 columns.
# Selecting specific variables in pandas DataFrame.
# Let us bring a csv file for future examples.
import pandas as pd
gravel = pd.read_csv("datasets/shoe_gravel_sample.csv")
print(gravel)
# Gathering some information about the DataFrame:
gravel.head()
gravel.tail()
gravel.info()
gravel.shape
# If a column contains only letters, underscores or float numbers, dot notation can be used.
gravel.radius
# There is another way to do that:
# Remember that, if our column name contains any spaces or special characters, this way is a must.
gravel_radius = gravel['radius'] # if you are not using square brackets, Python thinks that you are passing a function.
print(gravel_radius)
# Logical statements in Python.
# Boolean values are True or False.
# There are few logical statements in Python.
# Greater than or equal to >=
# Less than or equal to <=
# Greater than >
# Less than <
# You can check values if they are equal or not via ==.
distance_A = 200 # km
distance_B = 160 # km
scenario_1 = distance_A == distance_B # check if they are equal.
print(scenario_1)
scenario_2 = distance_A <= distance_B # check if distance_B is less than or equal to A.
print(scenario_2)
scenario_3 = distance_A >= distance_B # check if distance_A is greater than or equal to B.
print(scenario_3)
# For strings:
name = 'Ahmet'
name_2 = 'Mustafa'
name != name_2
name == name_2
# Creating plots via using matplotlib in Python.
# First, matplotlib must be imported in order to use it.
import pandas as pd
from matplotlib import pyplot as plt
# Select the DataFrame:
ransom = pd.read_csv("datasets/ransom.csv")
# Gathering information about DataFrame.
ransom.head()
ransom.tail()
ransom.info()
ransom.shape
ransom.describe()
# Creating Plots and adding labels:
plt.plot(ransom.letter_index, ransom.letter)
plt.xlabel("Letter Index")
plt.ylabel("Letter")
plt.title("Ransom")
# Show the plot.
plt.show()
Yarın Use a Dataset kullanarak tüm detayları kendin eklediğin bir notebook oluştur.