Let's apply our natural language processing knowledge to Twitter. Tweets are notoriously difficult, as they are shorter than most texts and usually have hard-to-parse content like hashtags, mentions, links and emoji. Despite the difficulties, tweets are fun content, so in this notebook we'll take a look at classifying two prominent North American politicians. Can we determine if it is Donald Trump or Justin Trudeau based on just a tweet? Let's see!
- 1Tweet classification: Trump vs. Trudeau
- 2Transforming our collected data
- 3Vectorize the tweets
- 4Training a multinomial naive Bayes model
- 5Evaluating our model using a confusion matrix
- 6Trying out another classifier: Linear SVC
- 7Introspecting our top model
- 8Bonus: can you write a Trump or Trudeau tweet?
Katharine Jarmul runs a data analysis company called kjamistan that specializes in helping companies analyze data and training others on data analysis best practices, particularly with Python. She has been using Python for 8 years for a variety of data work -- including telling stories at major national newspapers, building large scale aggregation software, making decisions based on customer analytics, and marketing spend and advising new ventures on the competitive landscape.