Loved by learners at thousands of companies
Let's apply our natural language processing knowledge to Twitter. Tweets are notoriously difficult, as they are shorter than most texts and usually have hard-to-parse content like hashtags, mentions, links and emoji.
Despite the difficulties, tweets are fun content, so in this notebook we'll take a look at classifying two prominent North American politicians. Can we determine if it is Donald Trump or Justin Trudeau based on just a tweet? Let's see!
- 1Tweet classification: Trump vs. Trudeau
- 2Transforming our collected data
- 3Vectorize the tweets
- 4Training a multinomial naive Bayes model
- 5Evaluating our model using a confusion matrix
- 6Trying out another classifier: Linear SVC
- 7Introspecting our top model
- 8Bonus: can you write a Trump or Trudeau tweet?
Katharine Jarmul runs a data analysis company called kjamistan that specializes in helping companies analyze data and training others on data analysis best practices, particularly with Python. She has been using Python for 8 years for a variety of data work -- including telling stories at major national newspapers, building large scale aggregation software, making decisions based on customer analytics, and marketing spend and advising new ventures on the competitive landscape.