It is estimated that over 70% of potentially usable business information is unstructured, often in the form of text data. Text mining provides a collection of techniques that allows us to derive actionable insights from unstructured data. In this course, we explore the basics of text mining using the bag of words method. The first three chapters introduce a variety of essential topics for analyzing and visualizing text data. The final chapter allows you to apply everything you've learned in a real-world case study to extract insights from employee reviews of two major tech companies.
Jumping into Text Mining with Bag-of-WordsFree
In this chapter, you'll learn the basics of using the bag-of-words method for analyzing text data.What is text mining?50 xpUnderstanding text mining50 xpQuick taste of text mining100 xpGetting started50 xpLoad some text100 xpMake the vector a VCorpus object (1)100 xpMake the vector a VCorpus object (2)100 xpMake a VCorpus from a data frame100 xpCleaning and preprocessing text50 xpCommon cleaning functions from tm100 xpCleaning with qdap100 xpAll about stop words100 xpIntro to word stemming and stem completion100 xpWord stemming and stem completion on a sentence100 xpApply preprocessing steps to a corpus100 xpThe TDM & DTM50 xpUnderstanding TDM and DTM50 xpMake a document-term matrix100 xpMake a term-document matrix100 xp
Word Clouds and More Interesting Visuals
This chapter will teach you how to visualize text data in a way that's both informative and engaging.Common text mining visuals50 xpTest your understanding of text mining50 xpFrequent terms with tm100 xpFrequent terms with qdap100 xpIntro to word clouds50 xpA simple word cloud100 xpStop words and word clouds100 xpPlot the better word cloud100 xpImprove word cloud colors100 xpUse prebuilt color palettes100 xpOther word clouds and word networks50 xpFind common words100 xpVisualize common words100 xpVisualize dissimilar words100 xpPolarized tag cloud100 xpVisualize word networks100 xpTeaser: simple word clustering100 xp
Adding to Your TM Skills
In this chapter, you'll learn more basic text mining techniques based on the bag of words method.Simple word clustering50 xpTest your understanding of text mining50 xpDistance matrix and dendrogram100 xpMake a dendrogram friendly TDM100 xpPut it all together: a text-based dendrogram100 xpDendrogram aesthetics100 xpUsing word association100 xpGetting past single words50 xpN-gram tokenization50 xpChanging n-grams100 xpHow do bigrams affect word clouds?100 xpDifferent frequency criteria50 xpChanging frequency weights100 xpCapturing metadata in tm100 xp
Battle of the Tech Giants for Talent
This chapter ties everything together with a case study in text mining for HR analytics.Amazon vs. Google50 xpOrganizing a text mining project50 xpStep 1: Problem definition50 xpStep 2: Identifying the text sources100 xpStep 3: Text organization50 xpText organization100 xpWorking with Google reviews100 xpSteps 4 & 5: Feature extraction & analysis50 xpFeature extraction & analysis: amzn_pros100 xpFeature extraction & analysis: amzn_cons100 xpamzn_cons dendrogram100 xpWord association100 xpQuick review of Google reviews100 xpCage match! Amazon vs. Google pro reviews100 xpCage match, part 2! Negative reviews100 xpStep 6: Reach a conclusion50 xpDraw conclusions, insights, or recommendations50 xpDraw another conclusion, insight, or recommendation50 xpFinished!50 xp
In the following tracksText Mining with R
DatasetsCoffee tweetsChardonnay tweetsAnonymous online reviews: AmazonAnonymous online reviews: Google
Ted KwartlerSee More
Adjunct Professor, Harvard University
Ted Kwartler is the VP, Trusted AI at DataRobot. At DataRobot, Ted sets product strategy for explainable and ethical uses of data technology in the company's application. Ted brings unique insights and experience utilizing data, business acumen and ethics to his current and previous positions at Liberty Mutual Insurance and Amazon. In addition to having 4 DataCamp courses he teaches graduate courses at the Harvard Extension School and is the author of Text Mining in Practice with R.