Regular expressions and file patterns in 30 minutes

Why should you read this article?

Regular Expressions/Regex is a powerful tool every software engineer should know. It enables you to search for files, search/transform/process/extract texts. Pretty much anything to do with text. This article gives you an introduction to Regex and its usages. If you have always wanted to learn Regex but found it very confusing/cryptic, trust me, I felt the same when I was learning it. Therefore, I am gonna make this learning curve really easy for you.

Your Regex playground

We are going to use http://regexr.com/ to play around. Later we will see how to use Regex in Unix commands like find, grep, awk, sed etc.

There are two types of characters

1. Meta characters. Characters for example ‘+’ doesn’t mean addition in regex, ‘.’ doesn’t mean a full-stop in regex. They have special meaning. They form the essence of Regex. You’ll read more in the following sections.

2. Normal characters. Characters that are NOT Meta characters. Example: ‘There was a Big Bug in my Bag’ is all just simply normal characters.

Regex = Combination of (Meta characters + Normal characters)

Legend

I am going to use blue to indicate a Meta character.

I am going to use red to indicate the actual textual matches. 

When I am using Italics please don’t read it in literal sense.

Search Patterns

Regex Definition Example
. Match any single normal character except newline.
Text:
There was a Big Bug in my Bag.
Regex:
B.g
Matches:
There was a Big Bug in my Bag.
Explanation:
B.g translates to, I would like to match texts that starts with B followed by . (any single normal character except new line) followed by g.
[normalCharacters]
Match any one of the enclosed normal characters.
Text:
There was a Big Bug in my Bag.
Regex:
B[iu]g
Matches:
There was a Big Bug in my Bag.
Explanation:
B[iu]g translates to,  I would like to match texts that starts with B followed by either i or u followed by g.
[^normalCharacters] Matches any normal characters not in the enclosed normal characters.
Text:
There was a Big Bug in my Bag.
Regex:
B[^iu]g
Matches:
There was a Big Bug in my Bag.
Explanation:
B[^iu]g translates to, I would like to match texts that starts with B followed by any character other than i or u followed by g.
[fromNormalCharactertoNormalCharacter] Matches a range of normal characters.
Text:
Hey Sam, here is my phone number 222-232-2312
Regex:
[09]
Matches:
Hey Sam, here is my phone number 2222322312
Explanation:
[09] translates to, I would like to match texts that are numbers in the range 0 to 9.
 
Text:
Hey Sam, here is my phone number 222-232-2312
Regex:
[AZ]
Matches:
Hey Sam, here is my phone number 222-232-2312. 
Explanation:
[AZtranslates to, I would like to match texts that are capital letters from A to Z, i.e, ABCDEFGHIJKLMNOPQRSTUVWXYZ.
Text:
Hey Sam, here is my phone number 222-232-2312
Regex:
[azAZ]
Matches:
Hey Sam, here is my phone number 222-232-2312.
Explanation:
[azAZtranslates to, I would like to match texts that are lower case letters like abcdefghijklmnopqrstuvwxyz or upper case letters like ABCDEFGHIJKLMNOPQRSTUVWXYZ.
^normalCharactersAndOrMetacharacters Matches the normal characters or meta characters following ^ at the beginning of every line.
Text:
The quick brown fox
The quick green fox
Quick yellow fox
Regex:
^The
Matches:
The quick brown fox
The quick green fox
Quick yellow fox
Explanation:
^The translates to, I would like to match texts that starts with the word The.
normalCharactersAndOrMetacharacters$ Match the normal characters and/or meta characters before $ at the end of every line.
Text:
The quick brown fox
The quick brown fox
Quick brown raccoon
Regex:
fox$
Matches:
The quick brown fox
The quick brown fox
Quick brown raccoon
Explanation:
fox$ translates to, I would like to match texts that end with the word fox.
normalCharactersAndOrMetacharacters|normalCharactersAndOrMetacharacters Matches normal characters and/or meta characters specified before or after |
Text:
This is a car
This is a bike
Regex:
car|bike
Matches:
This is a car
This is a bike
Explanation:
car|bike translates to, I would like to match texts that match a car or bike.
(normalCharactersAndOrMetacharacters)\n You can save and replay your regex by enclosing it between () and then reusing it with \n where n stands for the nth enclosing. Text:

I’m Slim Shady, yes I’m the real Shady
All you other Slim Shadys are just imitating
So won’t the real Slim Shady, please stand up,
Please stand up,
Please stand up

Regex:

I’m Slim (Shady), yes I’m the real \1
All you other Slim \1s are just imitating
So won’t the real Slim \1, please (stand up),
Please \2,
Please \2

Matches:

I’m Slim Shady, yes I’m the real Shady
All you other Slim Shadys are just imitating
So won’t the real Slim Shady, please stand up,
Please stand up,
Please stand up

Explanation:

 

That was Eminem’s song btw lol. So here, notice how we saved (Shadyand we are replaying it with \1. The next regex (stand up) is replayed with \2. Since its the second enclosure, we access it with \2

 

Quantifiers

The following are Meta characters used in Regex as quantifiers i.e they are used to specify the number of occurrences.

Regex Definition Example
normalCharactersAndOrMetacharacters{numberOfCharacters} Match the number of characters of the single character that immediately precedes it. The preceding character can also be a regex.
Text:
Hey Sam, here is my phone number 222-232-2312
Regex:
[09]{3}-[09]{3}-[09]{4}
Matches:
Hey Sam, here is my phone number 222-232-2312
Explanation:
[09] – Match any number
{3} – Match three times.
Therefore, [09]{3} translates to, match any number between 0 to 9, three times. We just wrote a regex for a valid US phone number.
normalCharactersAndOrMetacharacters{fromRange,toRange} Match a range of characters of the single character that immediately precedes it. The preceding character can also be a regex.
Text:
My age is 101. I am too old.
My age is 1. I was just born.
My age is 25. I am young.
My age is 1023. I am a dracula.
Regex:
My age is [09]{1,3}
Matches:
My age is 101. I am too old.
My age is 1. I was just born.
My age is 25. I am young.
My age is 1023. I am a dracula.
Explanation:
[09]{1,3} translates to, match any number between 0 to 9. It could range from 1 digit to 3 digits. Hence why 1023 did not match because it is 4 digits.
normalCharactersAndOrMetacharacters? Matches zero or one preceding character/regex.
Text:
I don’t like the color blue, I like the colour green.
Regex:
colou?r
Matches:
I don’t like the color blue, I like the colour green.
Explanation:
colou?r translates to, I would like to match texts that starts with c followed by o followed by l followed by o and it may or may not be followed by u followed by r.
normalCharactersAndOrMetacharacters* Matches zero or more preceding character/regex.
Text:
b be bee beers!!!
Regex:
be*
Matches:
b be bee beers!!!
Explanation:
betranslates to, I would like to match texts that starts with b followed by zero or many e.
normalCharactersAndOrMetacharacters+ Matches one or more of preceding character/regex.
Text:
b be bee beers!!!
Regex:
be+
Matches:
b be bee beers!!!
Explanation:
betranslates to, I would like to match texts that starts with b followed by one or many e.

Escaped characters

Regex Definition
Example
\metaCharacter Turns off the special meaning of the metacharacters we saw above.
Text:
This is the end of a sentence.
Regex:
sentence\.
Matches:
This is the end of a sentence.
Explanation:
Since . is a reserved character in regex, which means match zero or more preceding character, we have to escape it like this \. in order to treat it like a normal character.

 

Search and replace patterns

Nowadays, pretty much every text editor comes with a search and replace functionality. You could totally use them. There are times when you would have to search and replace texts in thousands of files. We can’t be doing that manually. This is where regex helps.

/normalCharactersAndOrMetacharactersToSearchFor/normalCharactersAndOrMetacharactersToReplaceWith

Additional meta characters used in search and replace pattern

Regex Definition Example
\n
& Represents the text that matched the search pattern. Text:

Ducks quack

Regex:

/quack/&&

Searches and Replaces:

Ducks quackquack

Explanation:

/quack/&& translates to, search for the text quack and save it in ‘&’. Every time there is an ‘&’  it replaces it with the searched text ‘quack’. Therefore, && means ‘quackquack’. Our replacement pattern is ‘&& twice’ therefore, we get ‘quackquack twice’ and our final replacement.

Leave a Reply

Your email address will not be published. Required fields are marked *