Tome's Land of IT

IT Notes from the Powertoe – Tome Tanasovski

Powershell – Part 5 – Regular Expressions, oh my!

A lot of people get intimidated by Regular Expressions, and maybe you should.  However, if you plan on doing any type of text manipulation or searching you will need to learn the basics.  Administrator tasks are littered with getting dumps of data into text form and having to extract, reformat, or find a way to make the data useful.  Regular Expressions do just that.

Before I continue I must mention that the be-all-end-all source of information for regular expressions is “Chapter 5 – Pattern Matching” in the Perl Bible, “Programming Perl” (The Camel Book) from O’Reily.  I highly recommend reading this chapter a few times (God knows I must have read it over 50 times).  It contains everything you need to know about regular expressions.  I don’t plan on discussing how to form regular expressions in this tutorial, but I want to show how Powershell implements regular expressions.

Now let’s get dirty.  Powershell implements two different methods of regular expressions.  There is a built-in method using comparison operators and there is the method of using a .Net Regex.

Comparison Operator Implementation

get-help about_Comparison_Operators get-help about_regular_expressions 

Comparison operators are things like -eq (equal), -gt (greater than), -ne (not equal).  The regular expression operators are -match,-notmatch, and -replace.  There are also two additional operators not listed in the above helps that do the exact same thing as their counterparts, but they are case sensitive: -cmatch and -creplace.

'test' -match '^t' 

The above returns true because ‘test’ contains a ‘t’ as its first character.  If we apply the -match above to the code from Part 2 of this series of tutorials that reads our dictionary file we can create a script that will list all words that start with the letter ‘t’:

Get-Content dictionary.txt | foreach { if ($_ -match '^t') { Write-Output $_ } } 

In addition to using matches to determine if something is true you can also use them to capture groups within your matches.

'Powertoe is the best' -match '(\w+) is the (\w+)' foreach ($match in $matches) {write-output $match} 

The $matches special variable is used to display the contents of the match.  $match[0] will always show the entire regex match (as it does in a .net match capture), while $match[1] will show the contents captured by your first parenthesis pair, $match[2] will show the contents of the second pair, etc.

-replace is very similar to -match, but there is one very important thing to note.  -replace does not interpret a regular expression the way a normal regular expression works.  It will always do a greedy match (perl equivalent: s///g):

'test' -replace 't', 'w' 

The output from the above operator is ‘wesw’ as opposed to ‘west’ like you would expect with a normal regular expression replace.  In order to have -replace work this way is to use the .Net implementation of regular expressions.

.Net Implementation

System.Text.RegularExpressions.Regex is the name of the class that implements regular expressions in .Net.  Here is how you would normally create an object of the Regex type:

$regex = new-object System.Text.RegularExpressions.Regex('^test$',[System.Text.RegularExpressions.RegexOptions]::MultiLine) 

All methods and properties available to this class are available to powershell.  However Powershell also gives a nice shortcut to creating regex objects.  The above command can also be written as:

$regex = [regex] '(?m)^test$' 

As mentioned we can now use this implementation of .Net’s Regex to do a non-greedy replace:

$regex = [regex]'t' $regex.Replace("test","w",1) 

That’s the down and dirty of it.  I apologize if this topic is a little bit more advanced than my others have been. I know it expects you to have an understanding of regular expressions.  I highly suggest reading both the pattern matching chapter of the Camel book as well as the supporting documentation about the Regex class to ensure that you learn all of the power available to you within regular expressions.  After that it’s just a matter of setting yourself to tasks that do string manipulation.  After the next couple of tutorials we should be able to start tackling real puzzles where you can challenge yourself to implement your own regular expressions so that you can start learning how to form them.

======Update======

I gave a very in-depth talk on Regular Expressions to the UK user group in March 2011.  The live meeting recording of the presentation is online to view here.

One response to “Powershell – Part 5 – Regular Expressions, oh my!

  1. Jeremy June 28, 2012 at 12:56 pm

    Thanks for the recommendation of the Chapter 5 from “Programming Perl”. It’s nice to see references from people who’s blogs/posts have helped me out. Knowing that they (you) learned what I was searching for from somewhere else, it’s nice to get a hint as to where that was.

Leave a comment