Skip to content
Course Notes: Introduction to Natural Language Processing in Python
  • AI Chat
  • Code
  • Report
  • Course Notes

    Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

    # Import any packages you want to use here
    import re
    import urllib.request
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    from collections import Counter
    import numpy as np
    import seaborn as sns
    
    import nltk
    from nltk.tokenize import sent_tokenize, word_tokenize, regexp_tokenize
    from nltk.corpus import stopwords
    from nltk.sentiment import SentimentIntensityAnalyzer
    
    nltk.download('punkt')
    nltk.download('stopwords')

    Introduction to Regular Expressions

    Emails

    You have a string that contains a list of email addresses separated by commas. Your task is to extract all the email addresses from the string using regular expressions.

    Example string: "John Doe [email protected], Jane Smith [email protected], Bob Johnson [email protected]"

    Expected output: ["[email protected]", "[email protected]", "[email protected]"]

    string = "John Doe <[email protected]>, Jane Smith <[email protected]>, Bob Johnson <[email protected]>"
    emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', string)
    print(emails)

    Phones

    You have a string that contains a list of phone numbers separated by commas. Your task is to extract all the phone numbers from the string using regular expressions.

    string = "John Doe: 555-1234, Jane Smith: 555-5678, Bob Johnson: 555-9012"
    pattern = r'\d{3,}-\d{4,}'
    output = re.findall(pattern, string)
    print(output)

    Weekends

    Sure, here's another challenging practice problem using regular expressions:

    You have a string that contains a list of dates in the format "YYYY-MM-DD". Your task is to extract all the dates that fall on a weekend (Saturday or Sunday) using regular expressions.

    Example string: "2023-05-01, 2023-05-02, 2023-05-03, 2023-05-04, 2023-05-05, 2023-05-06, 2023-05-07"

    Expected output: ["2023-05-01", "2023-05-07"]

    import datetime
    
    string = "2023-05-01, 2023-05-02, 2023-05-03, 2023-05-04, 2023-05-05, 2023-05-06, 2023-05-07"
    
    dates = []
    
    for date_str in string.split(", "):
        year, month, day = map(int, date_str.split("-"))
        date = datetime.date(year, month, day)
        if date.weekday() in [5, 6]:
            dates.append(date_str)
    
    print(dates)
    

    Vowels

    You have a string that contains a list of words separated by commas. Your task is to extract all the words that start with a vowel using regular expressions.

    Example string: "apple, banana, cherry, date, eggplant, fig, grapefruit"

    Expected output: ["apple", "eggplant"]

    string = "apple, banana, cherry, date, eggplant, fig, grapefruit"
    pattern = r'\b[aeiouAEIOU][a-zA-Z]*\b'
    print(re.findall(pattern, string))