Skip to content

Python Data Science Toolbox (Part 1)

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import pandas as pd
from functools import reduce

# Import the dataset
tweets = pd.read_csv('datasets/tweets.csv')

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Define count_entries()
def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""

    # Initialize an empty dictionary: cols_count
    cols_count = {}

    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over the column in DataFrame
    for entry in col:

        # If entry is in cols_count, add 1
        if entry in cols_count.keys():
            cols_count[entry] += 1

        # Else add the entry to cols_count, set the value to 1
        else:
            cols_count[entry] = 1

    # Return the cols_count dictionary
    return cols_count

# Call count_entries(): result1
result1 = count_entries(tweets_df)

# Call count_entries(): result2
result2 = count_entries(tweets_df,'source')

# Print result1 and result2
print(result1)
print(result2)

Explore Datasets

Use the DataFrame imported in the first cell to explore the data and practice your skills!

  • Write a function that takes a timestamp (see column timestamp_ms) and returns the text of any tweet published at that timestamp. Additionally, make it so that users can pass column names as flexible arguments (*args) so that the function can print out any other columns users want to see.
  • In a filter() call, write a lambda function to return tweets created on a Tuesday. Tip: look at the first three characters of the created_at column.
  • Make sure to add error handling on the functions you've created!