Pular para o conteúdo principal

Regular Expressions Cheat Sheet

Regular expressions (regex or regexp) are a pattern of characters that describe an amount of text. Regular expressions are one of the most widely used tools in natural language processing and allow you to supercharge common text data manipulation tasks.
5 de out. de 2022

Regular expressions (regex or regexp) are a pattern of characters that describe an amount of text. Regular expressions are one of the most widely used tools in natural language processing and allow you to supercharge common text data manipulation tasks. Use this cheat sheet as a handy reminder when working with regular expressions.

Regular Expressions Cheat Sheet.png

Have this cheat sheet at your fingertips

Download PDF

More on regular expressions 

To process regexes, you will use a “regex engine.” Each of these engines use slightly different syntax called regex flavor.  A list of popular engines can be found here. Two common programming languages we discuss on DataCamp are Python and R which each have their own engines.     

Since regex describes patterns of text, it can be used to check for the existence of patterns in a text, extract substrings from longer strings, and help make adjustments to text. Regex can be very simple to describe specific words, or it can be more advanced to find vague patterns of characters like the top-level domain in a url.  

Definitions

  • Literal Character: A literal character is the most basic regular expression you can use. It simply matches the actual character you write. So if you are trying to represent an “r,” you would write r. 
  • Metacharacter: Metacharacters signify to the regex engine that the following character has a special meaning. You typically include a \  in front of the metacharacter and they can do things like signify the beginning of a line, end of a line, or to match any single character. 
  • Character Class: A character class (or character set) tells the engine to look for one of a list of characters. It is signified by [ and ] with the characters you are looking for in the middle of the brackets. 
  • Capture Group: A capture group is signified by opening and closing, round parenthesis. They allow you to group regexes together to apply other regex features like quantifiers (see below) to the group.  

Anchors

Anchors match a position before or after other characters.

Syntax

Description

Example pattern

Example matches

Example non-matches

^

match start of line

^r

rabbit

raccoon

parrot

ferret

$

match end of line

t$

rabbit

foot

trap

star

\A

match start of line

\Ar

rabbit

raccoon

parrot

ferret

\Z

match end of line

t\Z

rabbit

foot

trap

star

\b

match characters at the start or end of a word

\bfox\b

the red fox ran

the fox ate

foxtrot

foxskin scarf

\B

match characters in the middle of other non-space characters

\Bee\B

trees

beef

bee

tree

Matching types of character

Rather than matching specific characters, you can match specific types of characters such as letters, numbers, and more.

Syntax

Description

Example pattern

Example matches

Example non-matches

.

Anything except for a linebreak

c.e

clean

cheap

acert

cent

\d

match a digit

\d

6060-842

2b|^2b

two

**___

\D

Match a non-digit

\D

The 5 cats ate
12 Angry men

52

10032

\w

Match word characters

\wee\w

trees

bee4

The bee

eels eat meat

\W

Match non-word characters

\Wbat\W

At bat

Swing the bat fast 

wombat

bat53

\s

Match whitespace

\sfox\s

the fox ate

his fox ran

it’s the fox.

foxfur

\S

Match non-whitespace

\See\S

trees

beef

the bee stung

The tall tree

\metacharacter

Escape a metacharacter to match on the metacharacter

\.

\^

The cat ate.

2^3

the cat ate

23

Character classes

Character classes are sets or ranges of characters.

Syntax

Description

Example pattern

Example matches

Example non-matches

[xy]

match several characters

gr[ea]y

gray

grey

green

greek

[x-y]

match a range of characters

[a-e]

amber

brand

fox

join

[^xy]

Does not match several characters

gr[^ea]y

green

greek

gray

grey

[\^-]

match metacharacters inside the character class

4[\^\.-+*/]\d

4^3

4.2

44

23



Repetition

Rather than matching single instances of characters, you can match repeated characters.

Syntax

Description

Example pattern

Example matches

Example non-matches

x*

match zero or more times

ar*o

cacao

carrot

arugula

artichoke

x+

match one or more times

re+

gree

tree

trap

ruined

x?

Match zero or one times

ro?a

roast

rant

root

rear

x{m}

match m times

\we{2}\w

deer

seer

red

enter

x{m,}

match m or more times

2{3,}4

671-2224

2222224

224

123

x{m,n}

match between m and n times

12{1,3}3

1234

1222384

15335

1222223

x*?, x+?, etc.

match the minimum number of times - known as a lazy quantifier

re+?

tree

freeeee

trout

roasted 

Capturing, alternation & backreferences

In order to extract specific parts of a string, you can capture those parts, and even name the parts that you captured.

Syntax

Description

Example pattern

Example matches

Example non-matches

(x)

capturing a pattern

(iss)+

Mississippi

missed

mist

persist

(?:x)

create a group without capturing

(?:ab)(cd)

Match: abcd

Group 1: cd

acbd

(?<name>x)

create a named capture group

(?<first>\d)(?<scrond>\d)\d*

Match: 1325

first: 1

second: 3

2

hello

(x|y)

match several alternative patterns

(re|ba)

red

banter

rant

bear

\n

reference previous captures where n is the group index starting at 1

(b)(\w*)\1

blob

bribe

bear

bring

\k<name> 

reference named captures

(?<first>5)(\d*)\k<first>

51245

55

523

51

Lookahead

You can specify that specific characters must appear before or after you match, without including those characters in the match.

Syntax

Description

Example pattern

Example matches

Example non-matches

(?=x)

looks ahead at the next characters without using them in the match

an(?=an)

iss(?=ipp)

banana

Mississippi

band

missed

(?!x)

looks ahead at next characters to not match on

ai(?!n)

fail

brail

faint

train

(?<=x)

looks at previous characters for a match without using those in the match

(?<=tr)a

trail

translate

bear

streak

(?<!x)

looks at previous characters to not match on

(?!tr)a

bear

translate

trail

strained

Literal matches and modifiers

Modifiers are settings that change the way the matching rules work.

Syntax

Description

Example pattern

Example matches

Example non-matches

\Qx\E

match start to finish

\Qtell\E

\Q\d\E

tell

\d

I’ll tell you this

I have 5 coins

(?i)x(?-i).

set the regex string to case-insensitive

(?i)te(?-i)

sTep

tEach

Trench

bear

(?x)x(?-x)

regex ignores whitespace

(?x)t a p(?-x)

tap

tapdance

c a t

rot a potato

(?s)x(?-s)

turns on single-line/DOTALL mode which makes the “.” include new-line symbols (\n) in addition to everything else 

(?s)first and second(?-s) and third

first and

Second and third

first and

second 

and third

(?m)x(?-m)

Changes ^ and $ to be end of line rather than end of string

^eat and sleep$

eat and sleep


eat and

sleep

treat and sleep


eat and sleep. 

Unicode

Regular expressions can work beyond the Roman alphabet, with things like Chinese characters or emoji.

  • Code Points: The hexadecimal number used to represent an abstract character in a system like unicode. 
  • Graphemes: Is either a codepoint or a character. All characters are made up of one or more graphemes in a sequence. 

Syntax

Description

Example pattern

Example matches

Example non-matches

\X

match graphemes

\u0000gmail

@gmail

www.email@gmail

gmail

@aol

\X\X

Match special characters like ones with an accent

\u00e8 or \u0065\u0300

è

e

Earn a Python Certification

Showcase you are a job-ready data scientist in Python
Build My Data Career
Temas
Relacionado

cheat-sheet

Text Data In R Cheat Sheet

Welcome to our cheat sheet for working with text data in R! This resource is designed for R users who need a quick reference guide for common tasks related to cleaning, processing, and analyzing text data.
Richie Cotton's photo

Richie Cotton

5 min

tutorial

A Guide to R Regular Expressions

Explore regular expressions in R, why they're important, the tools and functions to work with them, common regex patterns, and how to use them.
Elena Kosourova's photo

Elena Kosourova

16 min

tutorial

Excel Regex Tutorial: Mastering Pattern Matching with Regular Expressions

Discover the power of Regular Expressions (RegEx) for pattern matching in Excel. Our comprehensive guide unveils how to standardize data, extract keywords, and perform advanced text manipulations.
Chloe Lubin's photo

Chloe Lubin

12 min

tutorial

Python Regular Expression Tutorial

Discover the power of regular expressions with this tutorial. You will work with the re library, deal with pattern matching, learn about greedy and non-greedy matching, and much more!
Sejal Jaiswal's photo

Sejal Jaiswal

20 min

tutorial

Using Regular Expressions to Clean Strings

This tutorial takes course material from DataCamp's Cleaning Data in Python course and allows you to clean strings using regular expressions.
Ryan Sheehy's photo

Ryan Sheehy

4 min

tutorial

Utilities in R Tutorial

Learn about several useful functions for data structure manipulation, nested-lists, regular expressions, and working with times and dates in the R programming language.
Aditya Sharma's photo

Aditya Sharma

18 min

See MoreSee More