Why do we use ‘r’ in Regular Expressions (regex) ?

Gaurav Patil
4 min readSep 21, 2021

--

Photo by Erik Mclean on Unsplash

When I started learning Regular Expressions, the thing that kept me baffled for so long was the use of ‘r’. I went through some articles, documentation but somehow I was not satisfied with the explanations. So I ventured into experimenting with it using as many different kinds of examples as I can, to get better insight. This is an attempt to help all such confused souls by sharing what all I did to make sense using multitude of examples.

Let me first talk about python escape sequence and regex special characters.

There are many escape sequences in python. For instance, ‘\t ’ which represents a tab and ‘\n’ represents new line. This means when these escape sequences are used in python code they introduce a tab or a new line.

Just like python escape sequence, regex also has some special characters or metacharacters viz. . ^ $ * + ? { } [ ] \ | ( ) . They also have different meanings when used in regex code.

The letter ‘r’ stands for ‘raw string’.

First question that popped up in my mind was :

Is ‘r’ -raw string used only in regex ? — Big NO !!!

Let me illustrate it using following examples:

Use of ‘r’ in simple python code

Use of ‘r’ to read the file path in pandas

Regex with python escape sequence

Regex with regex special character

1. Use of ‘r’ in simple python code :

2. Use of ‘r’ to read the file path in pandas:

Here we try to open a csv file named ‘fruits’, which has a pandas dataframe .

Now that you’ve understood the use of raw string in places other than regex. Let’s try to use it in regex for python escape sequence and regex metacharacters as well.

1. Regex with python escape sequence

We have used findall() method in the following examples. This method is used to find all the matches of a substring in a given string and returns all such matches. Then we used len() method to count number of matches.

First example:

Second example :

Now we use sub() method. In the syntax first argument is the word pattern (or exact word) to replace , second argument is the new word to replace with, and third is a string in which replacement should be done. Let’s look at the example again with python escape sequence using sub() method.

2. Regex with regex special character :

Here we have used the character ‘\s’ (lowercase ‘s’) which in regex represents a whitespace. So we try to find out the number of times ‘\s’ appears in a given text using findall() and len() methods.

Note: We are not counting the number of whitespaces but ‘\s’ as a string.

First example:

Second example :

Here again we are trying to use sub() method to replace ‘\s’ with an actual whitespace.

So now it’s clear that we can even use backslash as an alternative for ‘r’. Here it was easy to locate whether given text has python special character or regex metacharacter and then decide how many backslashes to use. But in practice in case of huge datasets you may not have such advantage. So it’s better to always use ‘r’ before writing regex code. I hope now you have better understanding of raw string.

Thank you !!!

--

--

Gaurav Patil
Gaurav Patil

Written by Gaurav Patil

Machine Learning Engineer drawing insights from mathematics.

Responses (1)