Tutorial 1 : Regular Expressions in Python -Finding a word from a given text

Photo by Ilya Pavlov on Unsplash

WHAT:

Regex stands for ‘Regular Expression’. It is a matching pattern for a substring to be found in a given string. In a layman’s language, it can be considered as a generalized form of a text, be it a word or number or any other character or combination of all to be searched from a given text.

WHY:

Let me try to explain it using an example.

Suppose you are tasked with searching all the employee IDs present in a given text. You know that every employee id starts with two uppercase alphabets followed by five digits. Using this knowledge, your snippet of a python code will look like:

def employee_id(text):
employee_id_list=[]
for x in range(len(text)-7): #since employee id has 7 characters
if text[x:x+2].isupper() and text[x+2:x+7].isdigit():
employee_id_list.append(text[x:x+7])
return employee_id_list

employee_id('Employees with IDs AC23455 and HB45968 are to be promoted.')
output:
['AC23455', 'HB45968']

Corresponding code using regex:

import re
text = 'Employees with ids AC23455 and HB45968 are to be promoted.'
print(re.findall(r'[A-Z]{2}\d{5}',text))
output:
['AC23455', 'HB45968']

Some of you might say that this can also be done in a better way using list comprehension and there is no denying that. But even in that case, you have to use ‘for’ and ‘if’ statements along with isupper() and isdigit() methods. Keeping that in mind, now I want you to appreciate the simplicity and comfort, regex provides with just one line:

r'[A-Z]{2}\d{5}'

At first, it may appear as some complex code. But just bear with me for some time and gradually everything will make sense to you.

HOW:

Let’s learn how to use it. At first we will take simple example of finding all instances of a word or a substring in a given string.

text = ''' Many people still conflate Google with the internet.They don't know that Google is actually a search engine like Bing, Baidu, Yahoo. However aforementioned fact surely reflects the prevalent use of Google. '''import reregex_object = re.compile(r'Google')print(regex_object.findall(text))output:
['Google', 'Google', 'Google']

It’s time to decipher the code.

text

It is the string variable which has a string from which we need to find if the substring ‘Google’ is present in it.

import re

‘re’ is a module. Simply put, module is a python code consisting of functions, classes, variables. So ‘re’ has all functions that we can use specifically for regex. Every time we use regex, we need to import it because python internally doesn’t contain those functions along with python’s built-in functions.

regex_object = re.compile(r’Google’)

Compile a regular expression pattern into a regular expression object, which can be used for matching using methods. By re.compile, regex object is created and regex code r’Google’ is compiled. r’Google’ is regex pattern and ‘Google’ is the substring to be searched from a given string. ‘r’ means a raw string. More about raw string can be found here.

findall() method

This a method which returns a list of all the matches of a substring in the given string. Empty list means no matches found.

Few Examples:

text = " Is nature a creation of God or God itself ? "import reregex_code = re.compile(r'god')
print(regex_code.findall(text))
output:
[]

Here empty string is returned because we passed ‘god’ as regex code which is not equivalent to ‘God’. When we are using exact substring in a regex pattern, one should note that it is case-sensitive.

text = " P2P stands for peer-to-peer network. In a peer-to-peer network, peers are computer systems connected to each other via internet connection. "import reregex_code = re.compile(r'P2P')
print(regex_code.findall(text))
output:
['P2P']

I hope now you have basic idea about regex. We will dig deeper in the upcoming tutorials. Stay tuned.

Take a deep breathe and move to next tutorial. You can find next tutorial link in the comment box .

Happy learning!!!

--

--

--

I am a keen learner and diligent teacher with special interest in mathematics and machine learning.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How To Create Database Schemas Quickly and Intuitively With DBDesigner

The Jupyter Notebook Formatting Guide

85 Free Udacity Nanodegree Courses

MarsPool Music (NFT)

The Basics of Object Oriented Programming

Seven Skills That Your Team Needs if You Are Going with Open Source High Availability

Google’s 3-Word Plan to Help Employees Avoid Burnout Is So Simple You Should Steal It

Initial Neblio Integration Setup Guide for Unity Developers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gaurav Patil

Gaurav Patil

I am a keen learner and diligent teacher with special interest in mathematics and machine learning.

More from Medium

Tutorial 5 : OOPs in Python — Encapsulation

Variables in Python

Palindrome Program in Python