Intermediate Regular Expressions in R

Manipulate text data, analyze it and more by mastering regular expressions and string distances in R.
Start Course for Free
4 Hours14 Videos48 Exercises2,159 Learners
3650 XP

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Analyzing data that comes in tables is fun. But what if the things that we find <em>most interesting</em> are not available as a neatly organized dataset but in plain text? Do not despair: In this course, you'll learn everything you need to know to <em>create powerful regular expressions</em> that will help you find all the information you need for your analyses from just a blob of text. But not only that. Using the concept of <em>string distances</em> you will learn to work even with text that contains typos or scanning errors, as you will be able to match them to their correct counterparts from other data sources (record linkage). As a learning material, we will analyze real documents about box office figures in Swiss cinemas.

  1. 1

    Regular Expressions: Writing custom patterns

    Regular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.
    Play Chapter Now
  2. 2

    Creating strings with data

    In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.
    Play Chapter Now
  3. 3

    Extracting structured data from text

    One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.
    Play Chapter Now
  4. 4

    Similarities between strings

    In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.
    Play Chapter Now
Adel NehmeAmy Peterson
Introduction to RIntroduction to the TidyverseString Manipulation with stringr in R
Angelo Zehr Headshot

Angelo Zehr

Data Journalist
Angelo Zehr is working as a data journalist at SRF, the Swiss Public Broadcaster. In his work, he is regularly confronted with large amounts of messy text that he needs to search and make sense of. In addition to his work, he teaches data journalism at the University of Applied Sciences in Chur and other courses at the Swiss School of Journalism.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA