Tome's Land of IT

IT Notes from the Powertoe – Tome Tanasovski

Category Archives: sed

Linux folks meet piped objects, Microsoft folks meet sed!

The world is a buzz around the announcement that Microsoft has open sourced PowerShell and released a working version of the language for Mac and Linux.

2000px-tux-svgpowershell_5-0_icon

Personally, I’ve been looking forward to this for a long time.  While I believe that PowerShell is a great language for development, I’m NOT excited about that aspect of it on Linux.  I think it will be a long time before the core dotnet becomes tuned to a point where it would perform comparably to Python as an interpreted language, but I do have hope.  No, the reason I have wanted this is because it is a super booster to bash!

Bash Booster – Objects in the pipe

I cannot tell you how many times I’ve tinkered in a Linux shell over the last few years and cursed the fact that I didn’t simply have objects.  Sure I can represent objects in CSVs or JSON, but to truly interact with them on a non-text level is frustrating in a shell after you’ve used PowerShell for so long.  In PowerShell, this is the way the world works:

Invoke-Webrequest http://blah.blah.blah |
ConvertFrom-Json|where {size -gt 100} |select name, size |export-csv output.csv

It’s all about data manipulation.  Grab data, filter, select properties to create new data sets, and output that data.  Additionally, you often process that data inline, for example:

...| ConvertFrom-Json| select @{name=SizeMB;expression={size/1MB}}| ...

And because everything is objects, you can easily write your own parsers, manipulators, or outputters that are duck-typed to work with the objects coming in.  It’s truly a game changer in the shell.

Native Linux Commands to PowerShell? – Hello sed!

For those unfamiliar with Linux or those who use Linux who are looking for the quickest way to convert your text into PowerShell objects, I’m here to tell you that sed is your friend.  Basically, the process is to turn your Linux output into CSV with sed and then use ConvertFrom-CSV to turn the CSV into PowerShell objects.  Of course, this assumes there isn’t a built-in way to switch the output of the command to CSV or JSON.  If there is a switch that exists to do so, it is always the best way to go.  For this article we’re talking about pure text output to objects.

We’re going to use the -r format of sed so that the regexes are more robust as far as what you can use in the regex.  We’re also going to use the form s/regex/replace/g which basically says (s)earch for regex and swap the contents with replace (g)lobally in the input string.

For this example, we’ll look at the output of ps -f:

10:06:04 PS /> ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
tome     22528  4516  0 09:42 pts/2    00:00:00 -bash
tome     22689 22528  0 09:45 pts/2    00:00:04 powershell
tome     22760 22689  0 10:06 pts/2    00:00:00 /bin/ps -f

As you can see there are some spaces or tabs between each record. We can easily parse the contents of this and replace those spaces with a comma. To do this with sed, we do the following:

10:08:12 PS /> ps -f |sed -r 's/\s+/,/g'
UID,PID,PPID,C,STIME,TTY,TIME,CMD
tome,22528,4516,0,09:42,pts/2,00:00:00,-bash
tome,22689,22528,0,09:45,pts/2,00:00:04,powershell
tome,22774,22689,0,10:08,pts/2,00:00:00,/bin/ps,-f

For now, let’s just ignore the fact that the -f is showing up with a comma in the last line. There is a fix to that which I will give an example of, but for now, let’s just convert this to PowerShell:

10:27:24 PS /> ps -f |sed -r 's/\s+/,/g' |ConvertFrom-Csv |select uid, cmd

UID  CMD
---  ---
tome -bash
tome powershell
tome /bin/ps


10:27:49 PS /> ps -f |sed -r 's/\s+/,/g' |ConvertFrom-Csv |Format-Table

UID  PID   PPID  C STIME TTY   TIME     CMD
---  ---   ----  - ----- ---   ----     ---
tome 22528 4516  0 09:42 pts/2 00:00:00 -bash
tome 22689 22528 0 09:45 pts/2 00:00:09 powershell
tome 23058 22689 0 10:28 pts/2 00:00:00 /bin/ps

If you really need to ensure that the spaces in cmd are preserved, there is a good stack overflow discussion about it here, but the shortcut outcome would be to do something like this:

10:29:55 PS /> ps -f |sed -r 's/\s+/XXX/8' |sed -r 's/\s+/,/g' |
sed -r 's/XXX/ /g' |ConvertFrom-Csv |Format-Table

UID  PID   PPID  C STIME TTY   TIME     CMD
---  ---   ----  - ----- ---   ----     ---
tome 22528 4516  0 09:42 pts/2 00:00:00 -bash
tome 22689 22528 0 09:45 pts/2 00:00:09 powershell
tome 23084 22689 0 10:30 pts/2 00:00:00 /bin/ps -f

One final note: if you are new to PowerShell from Linux, the format commands always go at the end of an object chain. They are designed to change the display output to the screen. Otherwise, you should not use them. They will change your objects and is likely not what you want to do if you have a pipe after the format command.

What about headerless commands such as ls -l?

If you look at the output of ls -l and the corresponding output of the CSV file from a sed, it looks like this:

10:35:05 PS /home/tome> ls -l
total 4
-rw-rw-r--  1 tome tome    0 Aug 19 10:33 file1
-rw-rw-r--  1 tome tome    0 Aug 19 10:34 file2
drwxrwxr-x 11 tome tome 4096 Aug 17 10:36 PowerShell
10:35:08 PS /home/tome> ls -l |sed -r 's/\s+/,/g'
total,4
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

There are two problems with the above.  First, there is an extra line that has no relevant info.  Second, there is no header to tell PowerShell what the property names are for the objects.

Skipping a line with Select

Skipping the total line is easy using the skip argument to Select-Object:

10:35:10 PS /home/tome> ls -l |sed -r 's/\s+/,/g' |select -Skip 1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

Adding a custom header

ConvertFrom-CSV has a specific argument called header that allows you to supply a list that makes up what would be the header found in a CSV if one does not exist.  Here is how you can use it to convert the output of ls -l to actual PowerShell objects:

10:43:33 PS /home/tome> ls -l |sed -r 's/\s+/,/g' |select -Skip 1 |
ConvertFrom-Csv -Header @('mode','count','user','group','size','month','day','time','name') |Format-Table

mode       count user group size month day time  name
----       ----- ---- ----- ---- ----- --- ----  ----
-rw-rw-r-- 1     tome tome  0    Aug   19  10:33 file1
-rw-rw-r-- 1     tome tome  0    Aug   19  10:34 file2
drwxrwxr-x 11    tome tome  4096 Aug   17  10:36 PowerShell

 

Alternative to sed

Alternatively, you can use PowerShell in place of sed for replacing string contents. The pattern is generally like this:

10:43:55 PS /home/tome> ls -l |select -Skip 1 |%{$_ -replace '\s+', ','}
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

Summary

PowerShell adds a lot more tools into your belt.  When it comes to data manipulation, the paradigm shift to objects over text is a game changer.  I’m personally really happy to have this flexibility and I can’t wait to take advantage of it!

 

Advertisements
%d bloggers like this: