Skip to content
Python Data Science Toolbox (Part 2)
Python Data Science Toolbox (Part 2)
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import pandas as pd
import matplotlib.pyplot as plt
# Import the course datasets
world_ind = pd.read_csv('datasets/world_ind_pop_data.csv')
tweets = pd.read_csv('datasets/tweets.csv')Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Define count_entries()
def count_entries(csv_file, c_size, colname):
"""Return a dictionary with counts of
occurrences as value for each key."""
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Iterate over the file chunk by chunk
for chunk in pd.read_csv(csv_file, chunksize=c_size):
# Iterate over the column in DataFrame
for entry in chunk[colname]:
if entry in counts_dict.keys():
counts_dict[entry] += 1
else:
counts_dict[entry] = 1
# Return counts_dict
return counts_dict
# Call count_entries(): result_counts
result_counts = count_entries("tweets.csv", c_size=10, colname="lang")
# Print result_counts
print(result_counts)
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Create a
zipobject containing theCountryNameandCountryCodecolumns inworld_ind. Unpack the resultingzipobject and print the tuple values. - Use a list comprehension to extract the first 25 characters of the
textcolumn of thetweetsDataFrame provided that the tweet is not a retweet (i.e., starts with "RT"). - Create an iterable reader object so that you can use
next()to readdatasets/world_ind_pop_data.csvin chunks of 20.