Tome's Land of IT

IT Notes from the Powertoe – Tome Tanasovski

H2o – Machine Learning with PowerShell

This article will give you the basics to leverage H2o to create a neural net model in seconds and then show you how to consume that model all from Windows PowerShell.  We’ll assume you do NOT know what H2o is nor do you know anything about machine learning other than it exists.  It’s going to rely on the REST API as there is no module yet for H2o (HINT HINT – Great easy project if anyone wants to take it on).

But, machine learning is hard, isn’t it?  Don’t I need high-level mathematics and a degree?

A few years ago I took the flagship Stanford course on machine learning from Coursera and have a basic understanding of how the different algorithms and implementations work.  However, without day-to-day practice, the information slowly leaked from my brain.  Historically, the barrier to entry to implement machine learning has been a bit too much.  Even with libraries that implement the various algorithms, it simply required a ton of work to even get to the point where you could train a set of data.  Tuning that usually required graduate-level understanding of each algorithm and the math behind it.  If you’ve toyed in this space without getting serious, then you probably have a similar story.  All of that changed for me when starting to play with the python library SKLearn (won’t be discussing today) and the cross-platform open environment for analytics/machine learning called H2o.

Full disclosure, there is still some art to tweaking the algorithms and data to get an optimal data model, H2o makes it much easier.  For now, don’t’ worry about that.  I promise you’ll have a deep learning neural net training a data model in only a few minutes.

What is H2o?

You can read all about it on their website, but in my mind, H2o is an open platform to easily train data models, tweak them, and finally put them into production.  It comes with a set of libraries for python, java, and R, it has a REST API (which we will leverage today), and it even has a GUI if you want to stay out of code entirely.  Personally, I find the web GUI great, but use it mostly for examining data I’ve loaded or to quickly do something when I don’t feel like looking up the API/lib call.  When using it with PowerShell it becomes essential because the mapping of the API is directly against what you can do with Flow.

Isn’t Machine Learning All About Big Data?

No!  People have lost their minds when it comes to data.  Just because people have lots of data doesn’t mean you need that data.  Depending on the number of columns or properties your data has and how efficient you are storing data in your data sets (ideally, just the numbers), you can fit 5k rows on only ~200k of disk space.  A million rows is about 40 MB.  As long as you can fit your data set in a single server’s memory, you are outside of the realm of big data.  Whoever is telling you otherwise, is trying to sell you something.  Don’t get me wrong, you can run machine learning against large data sets, but most problems are not big data problems.

The Iris Data

Today, we will be using one of the most famous sets of open data, the Iris data.  You can read about the data and its properties here, but at a high level, there is a bunch of data about flower dimensions (pedal length/width & Sepal length/width) and a classification of 3 types of Iris species.  The goal is to use the data to create a model to predict the species of Iris based on the dimensions as input.

Quick start H2o

H2o is a complete running service that you will need to start on your computer with Java.

    1. Download java (if you don’t have it already)
    2. Download H2o and extract the h2o-version folder directly into c:\h2o
    3. Run H2o in PowerShell:
Java.exe -jar c:\h2o\h2o.jar

Alternatively, this will work as a background job if you want to keep everything in a single process:

Start-job -scriptblock {java.exe -jar c:\h2o\h2o.jar}

You can now browse to http://localhost:54321 to open the H2o Flow interface:

Step 1 Prep the Data

Prepping the data will proceed as follows:

  1. Import the Data
  2. Parse the Data
  3. Split the Data into Train and Test data

Import the Data

The API call for this is pretty straightforward. It allows you to pass a file path or URL to the ImportFiles function.

$url = "http://localhost:54321/3/{0}"
$iris_url = 'https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/iris_wheader.csv'

"Import the IRIS data"
$importfiles_url = $url -f "ImportFiles"
$importfiles_body = "path=$iris_url"
$ret = Invoke-RestMethod $importfiles_url -Method POST -Body $importfiles_body

Parse the Data

This step is about having H2o interpret the data format and load it into a native H2o Data Frame. Oddly, this step is the most complex to navigate in the API even though with all of the defaults H2o does a perfect job of parsing the data. The challenge is that you must first call a function to autodetect the Parse settings, and then use the output from that function to the Parse function. This brings us to the biggest issue with the API as it lives today in v3. The challenge is that the Post data does not accept JSON, but all of the return data from API is JSON. The team promises to introduce JSON data natively in the near future, but until then you must post data in key/value form as specified by application/x-www-form-urlencoded. Unfortunately, there are no native libraries to do this Marshaling between data types so I wrote a quick helper function to do a single-level of parsing for you:

function ConvertTo-FormData {
	param(
		  [Parameter(ValueFromPipeline=$true)] [PSObject] $InputObject
		 )
	Begin {
		$output = ""
	}
	Process {
		foreach ($prop in $InputObject.psobject.properties |select -expandproperty name) {
			if ($InputObject.($prop).gettype().name -eq "Boolean") {
				if ($InputObject.($prop)) {
					$output += "$prop=true&"
				} else {
					$output += "$prop=false&"
				}
			} if ($InputObject.($prop).gettype().isarray) {
				# hacky for h2o collections
				if ($InputObject.($prop).name) {
					$output += "$prop=[{0}]&" -f ($InputObject.($prop).name -join ",")
				} else {
					$output += "$prop=[{0}]&" -f ($InputObject.($prop) -join ",")
				}
			}
			else {
				$output += "$prop=" + $InputObject.($prop) + "&"
			}
		}
	}
	End {
		$output.Remove($output.Length-1,1)
	}
}

With this function in hand, calling ParseSetup and Parse is very straightforward once you know the parameters.  The parameters for any command can usually be figured out by a combination of looking at the input schema in the documentation (located inside of Flow under “Help”) and by trying to use the widgets in flow to see what the parameters the function is going to need:

And here it is in PowerShell code:

"Run parse setup to find out how H2o thinks it should parse the data"
$parsesetup_url = $url -f "ParseSetup"
$parsesetup_body = 'source_frames=[{0}]' -f $iris_url
$ret = Invoke-RestMethod $parsesetup_url -Method Post -Body $parsesetup_body

"Parse the data into a real H2o dataframe"
$parse_url = $url -f "Parse"
$parse_body = $ret | select source_frames,parse_type,separator,number_columns,single_quotes,column_names,column_types,check_header,chunk_size |ConvertTo-FormData
$parse_body += "&destination_frame=iris&delete_on_done=true"
$ret = Invoke-RestMethod $parse_url -Method Post -Body $parse_body

In the above, the destination_frame is the name of the data as it will now live on H2o. Once the data is imported, you can see what it looks like in H2o Flow with this common name.

Async Polling

Some of the functions are run asynchronously. The Parse function is one of those. When H2o does something in this manner, it will provide you with a job id that you may poll. Unfortunately there are no callbacks available or web sockets to tap into. Here’s a helper function to do the polling:

function Wait-H2oJob {
	param(
		  [Parameter(Mandatory=$true, Position=0)]
		  [String] $JobPath
	)
	$notdone = $true
	while ($notdone) {
		$status = invoke-restmethod ("http://localhost:54321" + $JobPath) |select -ExpandProperty jobs |select -ExpandProperty status
		$status
		if ($status -eq "DONE") {
			$notdone = $false
		} else {
			sleep -Milliseconds 500
		}
	}
}

Finally, with this function in hand, you can see when the Parse function is done with the following:

wait-H2oJob $ret.key.URL

Caveat: the job key does not always return in exactly the same properties. You’ll see this in the future async calls we are about to make.

Split the Data

Even though there are only 150 rows in the dataset, we’re going to split the data so that we can train against a portion of the data and validate how well the model works against the other portion of the data. The following code will put 90% of the data into a new data frame named “train” and 10% of the data into a data frame named “test”:

"Split the data into an 90% training set and a 10% testing set"
$splitframe_url = $url -f "SplitFrame"
$splitframe_body = "dataset=iris&ratios=[.90,.1]&destination_frames=[train,test]"
$ret = invoke-restmethod $splitframe_url -Method Post -Body $splitframe_body
wait-H2oJob $ret.key.URL

Step 2 – Build the Model

It may not look like much, but this is all you need to implement a machine learning neural net in H2o:

$deeplearning_url = $url -f "ModelBuilders/deeplearning"
$deeplearning_body = 'training_frame=train&response_column=class&model_id=neural'
$ret = invoke-restmethod $deeplearning_url -Method Post -Body $deeplearning_body
wait-H2oJob $ret.job.key.URL

In the above, we are saying to use the data set named “train” and that we will be using the data in the columns to predict the column named “class”. Finally, we’ll be naming this data model “neural” to make it easy to access and view in H2o Flow.

Step 3 – Use the Model to Predict Against the Test Data

Finally, we’ll call the Predictions function against the Test data. We’ll then look at the MSE or mean squared error to see how well the model fits the additional data. The lower this number, the better.

$predict_url = $url -f "Predictions/models/neural/frames/test"
$ret = invoke-restmethod $predict_url -method POST -Body "predictions_frame=predicted_test_data"
$ret.model_metrics |select -expandproperty MSE

The output would look like this. The lower the MSE, the better the model is performing. Later when truly exploring the power of H2o, you will use this value to help you understand how well models are doing (and how poorly you may be overfitting your model to your data).

0.14980011539780046

Step 4 – Use the model for new data

Basically, by repeating the steps above of importing new data, you may predict against this data using the data model that now lives in H2o that we created named “neural”:

"Load something you want to predict"
@"
sepal_len, sepal_wid, petal_len, petal_wid
5.1,3.5,1.4,0.15
"@ |out-file -encoding ASCII c:\h2o\predict.csv
$importfiles_url = $url -f "ImportFiles"
$importfiles_body = "path=c:\h2o\predict.csv"
$ret = Invoke-RestMethod $importfiles_url -Method POST -Body $importfiles_body

"Run parse setup to find out how H2o thinks it should parse the data"
$parsesetup_url = $url -f "ParseSetup"
$parsesetup_body = 'source_frames=[{0}]' -f $ret.destination_frames[0]
$ret = Invoke-RestMethod $parsesetup_url -Method Post -Body $parsesetup_body

"Parse the data into a real H2o dataframe"
$parse_url = $url -f "Parse"
$parse_body = $ret | select source_frames,parse_type,separator,number_columns,single_quotes,column_names,column_types,check_header,chunk_size |ConvertTo-FormData
$parse_body += "&destination_frame=predictme&delete_on_done=true"
$ret = Invoke-RestMethod $parse_url -Method Post -Body $parse_body
Wait-H2oJob $ret.job.key.URL

"Let's leverage the data model we built earlier to predict against this new data frame"
$predict_url = $url -f "Predictions/models/neural/frames/predictme"
$ret = invoke-restmethod $predict_url -method POST -Body "predictions_frame=predictme_results"

Forunately, H2o also gives you access to the raw data so that you can inspect the results of the prediction. The following will spit out the data in the data frame named predictme_results where we are storing the prediction results:

$results_url = $url -f "Frames/predictme_results"
$ret = invoke-restmethod $results_url
$ret.frames.columns |select label, data

The output of the above looks something like the following.

label           data
-----           ----
predict         {0.0}
Iris-setosa     {0.9999987226382198}
Iris-versicolor {1.2773201404672E-06}
Iris-virginica  {4.16397577662733E-11}

This means that with 99.99999% accuracy, the model believes that the values given would make this Iris belong to the species of Iris-setosa. That’s some serious confidence that we have identified our flower!

The Code Completely

All of the above code is available in a single script along with the MIT license for you to reuse every bit of it as you see fit.

What’s next?  Learn H2o!

Learn H2o – The best book I’ve played with is the O’Reilly book, “Practical Machine Learning with H2o“.  All of the code is in Python, but it’s more important to learn the interfaces and the details of the various parameters available with each available algorithm in H2o.  As a PowerShell developer, you should be able to follow along.  If you can’t, it may be time to learn a little python too – it may be a substandard language to PowerShell, but it is leaps and bounds ahead of PowerShell in terms of ecosystem and libraries – it also has great visual code editing tooling with Jupyter and Zeppelin if you want to start getting serious with machine learning or analytics.

Just as important will be attempting to use H2o via Flow a little more.  The API is completely reflected by the way that Flow works.  Additionally, Flow’s help section has links to the API to learn the interfaces as well as the input/output schemas as you need to do more with H2o and PowerShell.

What’s next?  Grid Searches & AutoMLBuilder!

The best things to look at while reading about H2o are grid searches to do parameter tuning of data models and the AutoMLBuilder which will try all of the algorithms and do parameter tuning automatically for you.  However, both of these are more interesting when you get to Sparkling Water.

What’s Next?  Sparkling Water!

Spark was originally designed to pull in large sets of data from Hadoop into a distributed cluster of servers all in memory.  This sped up the processing you could do with Hadoop data.  Sparkling water is the library that connects H2o to Spark.  It also lets you run H2o on a Spark cluster.  If you want to take the technique learned in this article to a distributed platform with thousands of cores churning, sparkling water does this for you without any additional work.  It’s especially effective when trying to find optimal parameters for the algorithm arguments.  This is generally very compute intensive due to the nature of how grid searches and the AutoMLBuilder are basically brute force attempts to find optimal sets of parameters.  They naturally parallelize very nicely.

What’s Next? Productionize a Model Without Keeping H2o Running

Finally, H2o is a great tool for building models, but after the model is built you probably want something a little lower in weight.  The answer is to download the pojo/mojo.  In an upcoming non-powershell post, I’ll share my techniques for using the pojos outside of H2o to create very fast REST services with a GUI that can be called and scaled as needed.

Advertisements

Techstravaganza – PowerShell Track – Call to Speakers

Techstravaganza 2017 is in NY on April 28

You may or may not know, but I am a cofounder of Techstravaganza in NY.  I also manage the PowerShell track.  We are having our call to speakers for the event on 4/28.  That’s not a lot of time and I have no one to blame but myself.  However, we do have money to fly folks to NY for the day.  So please read, review, consider, and send us your abstracts.

Topics

While Techstravaganza is typically an IT pro conference, our PowerShell track does not always conform to that.  We seek level 300/400 content that is deep and interesting.  All topics and levels will be considered.  Please feel free to submit as many abstracts as you wish for consideration.  The more you submit, the better your chance that the group will decide on one of your topics.

What to submit:

Title: A catchy title for your talk
Abstract: A paragraph describing your talk that will be used on our website and track materials
Speaker Bio: A brief bio explaining who you are and any relevant twitter handles or blog links that you would like linked to from our website.

Please e-mail powershellnyc at gmail with the above

Timelines

Deadline 3/19
Decision 3/22

Expenses

For those travelling from outside of NYC, Techstravaganza typically reimburses travel expenses.  We guarantee up to 500 dollars, but in the past we have been able to recuperate all expenses for our speakers for the past 6 years .  The exact amount is dependent upon the origin of all speakers coming to the event.  Also, please note, these are reimbursements for airfare/hotel/taxi.  We do not cover additional expenses or provide an honorarium or stipend of any kind.

Sponsor Requests

If youare a sponsor interested in sponsoring the Techstravaganza, please also e-mail powershellnyc at gmail.com so that we may send you the list of sponsorship opportunities.

Linux folks meet piped objects, Microsoft folks meet sed!

The world is a buzz around the announcement that Microsoft has open sourced PowerShell and released a working version of the language for Mac and Linux.

2000px-tux-svgpowershell_5-0_icon

Personally, I’ve been looking forward to this for a long time.  While I believe that PowerShell is a great language for development, I’m NOT excited about that aspect of it on Linux.  I think it will be a long time before the core dotnet becomes tuned to a point where it would perform comparably to Python as an interpreted language, but I do have hope.  No, the reason I have wanted this is because it is a super booster to bash!

Bash Booster – Objects in the pipe

I cannot tell you how many times I’ve tinkered in a Linux shell over the last few years and cursed the fact that I didn’t simply have objects.  Sure I can represent objects in CSVs or JSON, but to truly interact with them on a non-text level is frustrating in a shell after you’ve used PowerShell for so long.  In PowerShell, this is the way the world works:

Invoke-Webrequest http://blah.blah.blah |
ConvertFrom-Json|where {size -gt 100} |select name, size |export-csv output.csv

It’s all about data manipulation.  Grab data, filter, select properties to create new data sets, and output that data.  Additionally, you often process that data inline, for example:

...| ConvertFrom-Json| select @{name=SizeMB;expression={size/1MB}}| ...

And because everything is objects, you can easily write your own parsers, manipulators, or outputters that are duck-typed to work with the objects coming in.  It’s truly a game changer in the shell.

Native Linux Commands to PowerShell? – Hello sed!

For those unfamiliar with Linux or those who use Linux who are looking for the quickest way to convert your text into PowerShell objects, I’m here to tell you that sed is your friend.  Basically, the process is to turn your Linux output into CSV with sed and then use ConvertFrom-CSV to turn the CSV into PowerShell objects.  Of course, this assumes there isn’t a built-in way to switch the output of the command to CSV or JSON.  If there is a switch that exists to do so, it is always the best way to go.  For this article we’re talking about pure text output to objects.

We’re going to use the -r format of sed so that the regexes are more robust as far as what you can use in the regex.  We’re also going to use the form s/regex/replace/g which basically says (s)earch for regex and swap the contents with replace (g)lobally in the input string.

For this example, we’ll look at the output of ps -f:

10:06:04 PS /> ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
tome     22528  4516  0 09:42 pts/2    00:00:00 -bash
tome     22689 22528  0 09:45 pts/2    00:00:04 powershell
tome     22760 22689  0 10:06 pts/2    00:00:00 /bin/ps -f

As you can see there are some spaces or tabs between each record. We can easily parse the contents of this and replace those spaces with a comma. To do this with sed, we do the following:

10:08:12 PS /> ps -f |sed -r 's/\s+/,/g'
UID,PID,PPID,C,STIME,TTY,TIME,CMD
tome,22528,4516,0,09:42,pts/2,00:00:00,-bash
tome,22689,22528,0,09:45,pts/2,00:00:04,powershell
tome,22774,22689,0,10:08,pts/2,00:00:00,/bin/ps,-f

For now, let’s just ignore the fact that the -f is showing up with a comma in the last line. There is a fix to that which I will give an example of, but for now, let’s just convert this to PowerShell:

10:27:24 PS /> ps -f |sed -r 's/\s+/,/g' |ConvertFrom-Csv |select uid, cmd

UID  CMD
---  ---
tome -bash
tome powershell
tome /bin/ps


10:27:49 PS /> ps -f |sed -r 's/\s+/,/g' |ConvertFrom-Csv |Format-Table

UID  PID   PPID  C STIME TTY   TIME     CMD
---  ---   ----  - ----- ---   ----     ---
tome 22528 4516  0 09:42 pts/2 00:00:00 -bash
tome 22689 22528 0 09:45 pts/2 00:00:09 powershell
tome 23058 22689 0 10:28 pts/2 00:00:00 /bin/ps

If you really need to ensure that the spaces in cmd are preserved, there is a good stack overflow discussion about it here, but the shortcut outcome would be to do something like this:

10:29:55 PS /> ps -f |sed -r 's/\s+/XXX/8' |sed -r 's/\s+/,/g' |
sed -r 's/XXX/ /g' |ConvertFrom-Csv |Format-Table

UID  PID   PPID  C STIME TTY   TIME     CMD
---  ---   ----  - ----- ---   ----     ---
tome 22528 4516  0 09:42 pts/2 00:00:00 -bash
tome 22689 22528 0 09:45 pts/2 00:00:09 powershell
tome 23084 22689 0 10:30 pts/2 00:00:00 /bin/ps -f

One final note: if you are new to PowerShell from Linux, the format commands always go at the end of an object chain. They are designed to change the display output to the screen. Otherwise, you should not use them. They will change your objects and is likely not what you want to do if you have a pipe after the format command.

What about headerless commands such as ls -l?

If you look at the output of ls -l and the corresponding output of the CSV file from a sed, it looks like this:

10:35:05 PS /home/tome> ls -l
total 4
-rw-rw-r--  1 tome tome    0 Aug 19 10:33 file1
-rw-rw-r--  1 tome tome    0 Aug 19 10:34 file2
drwxrwxr-x 11 tome tome 4096 Aug 17 10:36 PowerShell
10:35:08 PS /home/tome> ls -l |sed -r 's/\s+/,/g'
total,4
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

There are two problems with the above.  First, there is an extra line that has no relevant info.  Second, there is no header to tell PowerShell what the property names are for the objects.

Skipping a line with Select

Skipping the total line is easy using the skip argument to Select-Object:

10:35:10 PS /home/tome> ls -l |sed -r 's/\s+/,/g' |select -Skip 1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

Adding a custom header

ConvertFrom-CSV has a specific argument called header that allows you to supply a list that makes up what would be the header found in a CSV if one does not exist.  Here is how you can use it to convert the output of ls -l to actual PowerShell objects:

10:43:33 PS /home/tome> ls -l |sed -r 's/\s+/,/g' |select -Skip 1 |
ConvertFrom-Csv -Header @('mode','count','user','group','size','month','day','time','name') |Format-Table

mode       count user group size month day time  name
----       ----- ---- ----- ---- ----- --- ----  ----
-rw-rw-r-- 1     tome tome  0    Aug   19  10:33 file1
-rw-rw-r-- 1     tome tome  0    Aug   19  10:34 file2
drwxrwxr-x 11    tome tome  4096 Aug   17  10:36 PowerShell

 

Alternative to sed

Alternatively, you can use PowerShell in place of sed for replacing string contents. The pattern is generally like this:

10:43:55 PS /home/tome> ls -l |select -Skip 1 |%{$_ -replace '\s+', ','}
-rw-rw-r--,1,tome,tome,0,Aug,19,10:33,file1
-rw-rw-r--,1,tome,tome,0,Aug,19,10:34,file2
drwxrwxr-x,11,tome,tome,4096,Aug,17,10:36,PowerShell

Summary

PowerShell adds a lot more tools into your belt.  When it comes to data manipulation, the paradigm shift to objects over text is a game changer.  I’m personally really happy to have this flexibility and I can’t wait to take advantage of it!

 

Announcing WinArduino – A PowerShell Module that helps you dev an Arduino from a Windows box

I recently starting taking the Arduino course on Coursera entitled “The Arduino Platform and C Programming” from the University of California, Irvine.  I was having a blast with the course, but I really wanted PowerShell interfaces while working on code.  For those that know me, they know that I like my standard interface while developing: conemu: with a split window for each thing I’m developing with vim in the top and powershell on the bottom or a putty session (also in conemu) with two tmux panes with vim on the top and bash on the bottom.

conemu_env.png

I wanted this same setup for Arduino while developing on Windows.  Hence I created the WinArduino project to provide the tools I needed to program the Arduino that can run in my bottom PowerShell pane.

Features

Here’s a list of the two major features in WinArduino:

Compile and Verify

Currently this requires ArduinoIDE to be installed on your system, but I may go deeper and eventually compile/upload directly without it.  For now, it suits my needs by providing a wrapper to the Arudino IDE debug commandline interfaces to compile or compile and upload to my Arduino over USB.

Compiling is done with Invoke-ArduinoVerify and compiling with upload is done with Invoke-ArduinoUpload.

Invoke-ArduinoUpload c:\sketches\Blink\Blink.ino

Serial Input/Output

In the examples folder on the project are some robust examples which include some Arduino sketches to show how you can use PowerShell to either read/debug from serial messages in your Arduino code or how you can use PowerShell as an interface to directly control your Arduino board.  The latter is pretty fun – the example simply lets you write a 0 or 1 to turn off or on an LED, but it really opens the door to a lot of really fun things you can control with PowerShell in the IOT-Arduino world.

Get Started Now!

So what are you waiting for?  Sign up for the coursera course, download the WinArduino module, buy yourself an Arduino kit, and start hacking the electrons in your circuits!

 

Powerbits 10.5 History Revisited hg & he

Not too long ago, I posted a powerbit for a function that I was using called hgrep.  This is the evolution of that function and how I use it ALL the time in Windows PowerShell.

Here are the two functions, hg (history grep) and he (history execute).  Note: hg is also used by mercurial so feel free to change the alias to suit your needs – you can get the gist with a license to use/modify it off of github.

function Get-HistoryGrep {
    param(
        [Parameter(Mandatory=$false, Position=0)]
        [string]$Regex
    )
    get-history |?{$_.commandline -match $regex}
}

function Invoke-History {
    param(
        [Parameter(Mandatory=$true, Position=0)]
        [int]$Id
    )
    get-history $Id |%{& ([scriptblock]::create($_.commandline))}
}

new-alias hg Get-HistoryGrep
new-alias he Invoke-History

Basically, the functions are used like this:

“Oh man, I need to re-execute that function again – it was an invoke-pester command, but it had a bunch of switches that I got wrong, and then finally got right”

PS C:\test> hg pester

  Id CommandLine
  -- -----------
   5 Invoke-Pester -TestName 'test1' -OutputXml out.xml
   6 Invoke-Pester -TestName 'test1' -OutputFile out.xml
   7 Invoke-Pester -TestName 'test1' -OutputFile out.xml -OutputFormat nunitxml

“Oh, that’s the one, number 7! Let me run that again”

PS C:\test> he 7

Now, you can reuse the above over and over without having to look it up. This is exactly how you code in Linux in the shell. The equivalent in Linux is this:

history |grep pester
!7

Unfortunately, we can’t use the “!” special character in PowerShell because it is reserved to inverse boolean values. However, the general workflow is the same.  If you develop, live, and probably one day die in the shell, then these two functions are essential to your daily life.

One final note, if you run over 4096 commands in the shell, by default “he” won’t work with the earliest commands.  You can modify the special variable, $MaximumHistoryCount, if you want to change this behavior

Interprocess Communication (IPC) Between Asynchronous Events Triggered in C# to PowerShell

I played with Apache Zookeeper and PowerShell all weekend.  While I won’t dig into what I’ve been doing deeply in this article, I will tell you that the .NET interface requires a lot of asynch callbacks (event triggering).  Fortunately, I worked out a way to not only receive an event from C# (easy Bruce Payette taught us that years ago), but I also found a way to communicate variables into the runsapce of the action that gets executed when an event is triggered.

We’ll start with a simple C# class with an event to subscribe to.  In this case, I’m going to subscribe to the Changed event of my TestEvent.Watcher class.  This class has a single method which I can use to invoke the event with a string message attached to it.  This code should be invoked prior to running anything else in this post:

$code = @"
namespace TestEvent
{
    using System;

    public class Watcher
    {
        public delegate void ChangedEvent(object sender, WatcherEventArgs e);
        public event ChangedEvent Changed;

        public virtual void InvokeEvent(string message)
        {
            if (Changed != null) {
                Changed(this, new WatcherEventArgs(message));
            }
        }
    }
    public class WatcherEventArgs : EventArgs
    {
        public string message;
        public WatcherEventArgs(string message) {
            this.message = message;
        }
    }
}
"@
add-type -typedefinition $code

Here is the code to create the subscription and invoke the event. Register-ObjectEvent creates a job to run the event actions:

$watcher = new-object testevent.watcher

$job = Register-ObjectEvent -InputObject $watcher -EventName Changed -Action {
    $eventargs.message
}

$watcher.InvokeEvent('Triggering this message')

sleep 1 # just to ensure that the trigger happens
$job |receive-job

When we invoke, it looks like this:

PS C:\> .\test.ps1
Triggering this message

In Zookeeper, I needed to pass the current and valid connection object I had to the watcher so that when the event was triggered it would be sure to use the live object. Unfortunately, I didn’t see a way to pass arguments to the scriptblock, but I realized that I had full access to the $sender variable. In the above example,$sender is the testevent.watcher object I created and stored in the $watcher variable. Therefore, I figured I could just use add-member on $watcher to attach whatever objects I need to the action scriptblock.  These could then be accessed via $sender,

$watcher = new-object testevent.watcher
$watcher |add-member -NotePropertyName 'object' -NotePropertyValue 'value at watcher creation'

$job = Register-ObjectEvent -InputObject $watcher -EventName Changed -Action {
    $eventargs.message
    $sender.object
}

$watcher.InvokeEvent('Triggering this message')

sleep 1 # just to ensure that the trigger happens
$job |receive-job

The results of the above show that $sender.object indeed has the value I set when I registered the event.

PS C:\> .\test.ps1
Triggering this message
value at watcher creation

With that accomplished, I had to test whether or not the parent script could modify the $watcher object prior to the trigger and still set the value. This would enable me to pass live objects if they ever changed or needed to be updated by the parent script.

$watcher = new-object testevent.watcher
$watcher |add-member -NotePropertyName 'object' -NotePropertyValue 'value at watcher creation'

$job = Register-ObjectEvent -InputObject $watcher -EventName Changed -Action {
    $eventargs.message
    $sender.object
}

$watcher.object = 'value at watcher invokation'
$watcher.InvokeEvent('Triggering this message')

sleep 1 # just to ensure that the trigger happens
$job |receive-job

As you can see, success! The object is updated and passed at the time of the trigger rather than at the time of registration.

C:\> .\test.ps1
Triggering this message
value at watcher invokation

In case your interested, communication from the triggered job to the parent (the other half of IPC) is very easy. Simply use the messages/objects from the output of the scriptblock itself. In other words, your parent script will need to call receive-job and deal with the messages/output accordingly.

UPDATE

The above technique is sound and still useful for controlling messages to an event trigger.  It’s a very clean technique to use the $sender object.  However, I realized further in my testing and playing that it’s not always necessary.  Actions have access to GLOBAL and SCRIPT scope variable.  Not only can they access those variables on demand, but they can set them too.  This winds up being an even easier medium for transferring state because it’s actually just updating the live state.

Invoking the GetDiskSpaceInformation method of the IOfflineFilesCache COM interface with C# and PowerShell

This article could also be entitled, “Using an Inproc COM server in C# and PowerShell”.

Part I of this series shows how to invoke the GetDiskSpaceInformation method of the IOfflineFilesCache COM interface via C++. This was accomplished after failing miserably at trying to get it to work with C# and PowerShell. However, after I solved the problem in C++ I understood exactly how it worked and was able to ask better questions in Google to find out how to do the same in C#.

The challenge

First, to explain why it’s hard. When I first looked at the docs, I thought this would be easy. In PowerShell, a COM object can be invoked by instantiating a com object with new-object.  For example:

$app = new-object -comobject excel.application

Even if that didn’t work, I knew that often times C++ code could be Marshaled to and from the world of C# and managed code with a bit of tinkering. This is usually where you see pinvoke.net and code that leverages add-type.  However, in this case, the libraries do not exist in pinvoke.  Basically, because this is really COM, you cannot do this.  Also, because there are no tlb files associated, you cannot easily just use the interfaces like they are COM.

Just to be clear as to why this is the case:  This is a new interface.  It was plugged in by the Windows developers into the latest versions of Windows.  It’s implemented in COM so that other languages can get access to it.  However, it’s not fully at the point where it needs to be flushed into general use.  I expect that in the years to come, we’ll see these interfaces exposed with a tlb and eventually there may even be a PowerShell module that is used to manage the offline files cache directly.  However, if you want access before that day comes, you need to get crafty.

Finding GUIDs OLE/COM Object Viewer

The key to invoking this code from C# is to know the GUIDs that were used for both the CLSID and the interface we are trying to get access to.  This can be accomplished by running the OLE/COM Object Viewer.  For me, this was installed with my Visual Studio 2013 and can be found here: C:\Program Files (x86)\Windows Kits\8.1\bin\x86\oleview.exe.

Once in the GUI, you can browse to Object Classes->All Objects->Offline Files Cache Control

olecomviewer1

We’re looking for the GUID: 48C6BE7C-3871-43CC-B46F-1449A1BB2FF3

Next, if you double-click on that you’ll see the interfaces.  In our case, we want the IOfflineFilesCache interface.

olecomviewer2The GUID is 855D6203-7914-48B9-8D40-4C56F5ACFFC5

It should be noted that these GUIDs are static.  You do not need to run the COM viewer on your desktop if you are invoking the exact same interface that I am demonstrating.  The GUIDs are the same on ever computer.  However, this is here to show the generic steps that are needed to invoke the methods on one of these no-TLB COM interfaces.

CreateInstance

The first step is to create an instance of the CLSID using the GUID we found above.

Guid ID = new Guid("48C6BE7C-3871-43cc-B46F-1449A1BB2FF3");
Type idtype = Type.GetTypeFromCLSID(ID);
IOfflineFilesCache obj = (IOfflineFilesCache) Activator.CreateInstance(idtype, true);

It should be noted that the code we are using requires:

using System.Runtime.InteropServices;

Unfortunately, in the above bit of code, we are referencing the IOfflineFilesCache type, but it does not yet exist anywhere.  Therefore, we have to help C# know what this is by creating an interface with the ComImport attribute

Interface Attributes

I should note that everything I’m demonstrating is documented here.  However, it’s a bit cludgy to get through.  Also, there are a few key elements it leaves out.  Specifically, it neglects to inform you that when you create the interface, you must implement all of the methods that exist in the interface in exact order leading up to the method you care about using.  The best way to get the methods and the order they are implemented is to read the C++ header file.  This was one of the #include files I showed in part I of this article.  Specifically, you need to view cscobj.h.  A quick search of your hard drive should find it in an SDK folder.  Once you have read through this, you can create the interface in the proper order:

    [ComImport]
    [Guid("855D6203-7914-48B9-8D40-4C56F5ACFFC5"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    interface IOfflineFilesCache
    {
        [PreserveSig()]
        int Synchronize();

        [PreserveSig()]
        int DeleteItems();

        [PreserveSig()]
        int DeleteItemsForUser();

        [PreserveSig()]
        int Pin();

        [PreserveSig()]
        int UnPin();

        [PreserveSig()]
        int GetEncryptionStatus();

        [PreserveSig()]
        int Encrypt();

        [PreserveSig()]
        int FindItem();

        [PreserveSig()]
        int FindItemEx();

        [PreserveSig()]
        int RenameItem();

        [PreserveSig()]
        int GetLocation();

        [PreserveSig()]
        int GetDiskSpaceInformation(ref ulong pcbVolumeTotal, ref ulong pcbLimit, ref ulong pcbUsed, ref ulong pcbUnpinnedLimit, ref ulong pcbUnpinnedUsed);

        // only need to go as far as the function you need, but rest here for completeness

        [PreserveSig()]
        int SetDiskSpaceLimits();

        [PreserveSig()]
        int ProcessAdminPinPolicy();

        [PreserveSig()]
        int GetSettingObject();

        [PreserveSig()]
        int EnumSettiingObjects();

        [PreserveSig()]
        int IsPathCacheable();
    }

You’ll notice that in the above, the GUID is the GUID we found when looking at the interface in the OLE/COM viewer.

Finished code, but let’s do it in PowerShell

So, now that we have the interface and an instance of it, the rest is easy.  The following final bit of code injects the C# into PowerShell via add-type.  I simply return the results as a collection and then convert it into an object in PowerShell, but you could just as easily modify the code to have the object returned directly from C#.

$code = @'
using System;
using System.Runtime.InteropServices;

public class offlinecache {
    public static ulong[] GetOfflineCache() {
        ulong pcbVolumeTotal=0, pcbLimit=0, pcbUsed=0, pcbUnpinnedLimit=0, pcbUnpinnedUsed=0;
        Guid ID = new Guid("48C6BE7C-3871-43cc-B46F-1449A1BB2FF3");
        Type idtype = Type.GetTypeFromCLSID(ID);
        IOfflineFilesCache obj = (IOfflineFilesCache) Activator.CreateInstance(idtype, true);
        int i = obj.GetDiskSpaceInformation(ref pcbVolumeTotal, ref pcbLimit, ref pcbUsed, ref pcbUnpinnedLimit, ref pcbUnpinnedUsed);
        ulong[] output = new ulong[5];
        output[0] = pcbVolumeTotal;
        output[1] = pcbLimit;
        output[2] = pcbUsed;
        output[3] = pcbUnpinnedLimit;
        output[4] = pcbUnpinnedUsed;
        return output;
    }

    [ComImport]
    [Guid("855D6203-7914-48B9-8D40-4C56F5ACFFC5"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    interface IOfflineFilesCache
    {
        [PreserveSig()]
        int Synchronize();

        [PreserveSig()]
        int DeleteItems();

        [PreserveSig()]
        int DeleteItemsForUser();

        [PreserveSig()]
        int Pin();

        [PreserveSig()]
        int UnPin();

        [PreserveSig()]
        int GetEncryptionStatus();

        [PreserveSig()]
        int Encrypt();

        [PreserveSig()]
        int FindItem();

        [PreserveSig()]
        int FindItemEx();

        [PreserveSig()]
        int RenameItem();

        [PreserveSig()]
        int GetLocation();

        [PreserveSig()]
        int GetDiskSpaceInformation(ref ulong pcbVolumeTotal, ref ulong pcbLimit, ref ulong pcbUsed, ref ulong pcbUnpinnedLimit, ref ulong pcbUnpinnedUsed);

        // only need to go as far as the function you need, but rest here for completeness

        [PreserveSig()]
        int SetDiskSpaceLimits();

        [PreserveSig()]
        int ProcessAdminPinPolicy();

        [PreserveSig()]
        int GetSettingObject();

        [PreserveSig()]
        int EnumSettiingObjects();

        [PreserveSig()]
        int IsPathCacheable();
    }
}
'@

add-type -TypeDefinition $code

$output = ([offlinecache]::GetOfflineCache()) 
new-object psobject -Property ([ordered] @{
    VolumeTotal = $output[0]
    Limit = $output[1]
    Used = $output[2]
    UnpinnedLimit = $output[3]
    UnpinnedUsed = $output[4]
})

Here’s a trail of it running:

06:50:06 PS C:\Dropbox\scripts> .\GetOfflineCacheDiskInfo.ps1


VolumeTotal   : 469636214784
Limit         : 109076041728
Used          : 0
UnpinnedLimit : 109076041728
UnpinnedUsed  : 0

Invoking the GetDiskSpaceInformation method of the IOfflineFilesCache COM interface with C++

This is a two-part post.  If you only care about solving this problem in C# and PowerShell, feel free to skip this and move to Part II.

My friend at work came to me with a problem.  There are some functions for offline files that only exist in the API.  Knowing my affinity for taking on bizarre pinvoke API calls, he came to me and asked whether or not I could get this function to work in PowerShell.  We’ll go into PowerShell and C# in Part II of this post, but to put it simply, this was an extremely difficult puzzle to figure out.  It was something I have never done and it is something I have never read anyone else do (I did a lot of googling and some begging on twitter – so if anyone does have a first post, please post it in the comments).  After failing at C# and PowerShell, I decided to try failing at implementing this in C++.

GetDiskSpaceInformation

The method in question is GetDiskSpaceInformation.  When looking at the documentation, you get a few clues through the two DLLs listed and the one Header, i.e., CscSvc.dll, CscObj.dll, and CscObj.h.  As you can see, there are no examples.  After some poking, I was able to find an example for another function, GetEncryptionStatus.  However, this is the easy part.  To me, this is just C++ and I could figure this out.  The hard part is called out in the code example for GetEncryptionStatus with an assumption on the first line of comments.  It says that we “Assume we already have a cache ptr”, referring to the IOfflineFilesCache *pCache interface.

IOfflineFilesCache Interface

The documentation for this also has no examples nor do any of the additional interfaces.  Sigh – this I think is why fewer people do C++ at all in the Microsoft world.  It’s a very frustrating web of documentation.  Regardless, the clue to figuring out how to implement this interface is in a single line in the IOfflineFilesCache interface documentation underneath the “When to use” section of the doc:

Create this object as an in-proc COM server using the class ID CLSID_OfflineFilesCache.

So this is COM – but not your C#-managed-code COM.  No, that would be too easy.

In-proc COM server client

I will spare you the hours of searching and tinkering to get this right.  The end result is that you must use a specific function called CoCreateInstance.  Of course, you will find zero posts that show exactly how to do this.  So finally, here it is:

IOfflineFilesCache *pCache;
HRESULT hr;
CoInitialize(nullptr);
hr = CoCreateInstance(CLSID_OfflineFilesCache, NULL, CLSCTX_INPROC_SERVER, IID_IOfflineFilesCache, (void**)&pCache);
if (FAILED(hr)) {
    std::cerr << "ERROR: " << hr << "\n";
    return 0;
}

I have the following includes in my code when invoking the above:

#include “stdafx.h”
#include
#include

The final code and ultimate solution

Without further ado, here is the full C++ code to invoke this function.  It displays the 5 values returned to stdout.

#include "stdafx.h"
#include <cscobj.h>
#include <iostream>
int _tmain()
{
    IOfflineFilesCache *pCache;
    HRESULT hr;
    CoInitialize(nullptr);
    hr = CoCreateInstance(CLSID_OfflineFilesCache, NULL, CLSCTX_INPROC_SERVER, IID_IOfflineFilesCache, (void**)&pCache);
    if (FAILED(hr)) {
        std::cerr << "ERROR: " << hr << "\n";
        return 0;
    }
    ULONGLONG pcbVolumeTotal;
    ULONGLONG pcbLimit;
    ULONGLONG pcbUsed;
    ULONGLONG pcbUnpinnedLimit;
    ULONGLONG pcbUnpinnedUsed;
    hr = pCache->GetDiskSpaceInformation(&pcbVolumeTotal, &pcbLimit, &pcbUsed, &pcbUnpinnedLimit, &pcbUnpinnedUsed);
    if (FAILED(hr)) {
        std::cerr << "ERROR: " << hr << "\n";
        return 0;
    }
    std::cout << "VolumeTotal:\t"    << pcbVolumeTotal    << "\n";
    std::cout << "Limit:\t\t"        << pcbLimit            << "\n";
    std::cout << "Used:\t\t"        << pcbUsed            << "\n";
    std::cout << "UnpinnedLimit:\t" << pcbUnpinnedLimit << "\n";
    std::cout << "UnpinnedUsed:\t"    << pcbUnpinnedUsed    << "\n";
    return 1;
}

The code can also be found on github.
Note: This was done on a windows 8.1 box using Visual Studio 2013 (important, because the libs are not in all versions of windows).

Tada

Once compiled, you have a nice utility that wraps this function:

offlinefilescachexe

In part II of this article, I will show you how the above is done in C#.  Once in C#, it’s just a copy/paste to get it to work in PowerShell via Add-Type.

Look back at 2014 and look forward to 2015

A look back at 2014

2014 was the year of containers for me.  I spent a lot of time looking at Docker, Kubernetes, Mesos with Marathon and Aurora, and working (and using) tooling around the core Linux Kernel components that make these platforms possible.

shipping2

Additionally, this was the year that I fully embraced and understood Apache Zookeeper and played with etcd.  For me, Zookeeper becomes the foundation of nearly any distributed system I write.  It’s easy and it works.  Etcd has a place when zookeeper is overkill, but I have yet to use it in anything real.

Elasticsearch – I finally got dirty with elasticsearch.  I have a lot of positive things to say about it.  I’m interested in seeing how far the software can be taken.  Specifically to see whether the datastore (that can now be backed up) will become an actual data tier rather than just an efficient layer on top of a data tier.

In the Perl world, I learned Moose which make Perl actually usable in modern day programming.  It provides objects and types to Perl.

I was really happy to have the opportunity to implement ZeroMQ into something I was working on.  I am really excited by this library and I hope 2015 gives me a chance to write something about PowerShell and ZeroMQ.  There are very few platform and language agnostic libraries out there.  Additionally, the perfect abstraction with robust patterns and documentation make it a lot of fun to tinker with new communication topologies with just a few minor changes to code.  I have not been so inspired or zealously passionate about something like this since PowerShell 2.0 took my brain.

I played with AngularJS a bit.  It was fun.  However, every time I sit down to do UI work, I feel defeated.  It’s just something I’m not amazing at nor do I think I really want to be.  I’m glad to understand the framework and how it works, but I’ll save the meaty part of that work for others.

Pester Pester Pester!!! Test-driven development took over my PowerShell life (as well as all other languages).  My coding takes longer, but I have much more faith in it.  Pester is the greatest thing to happen to PowerShell from the community ever!  I was really happy to work with my internal corporate teams to build SDLC best-practices for PowerShell that involve Continous Integration (CI) and a Pester as the testing framework.

In 2014 I was introduced by my mentor to some time management techniques outlined in Eat that Frog.  Basically it’s about turning your life into Agile.  That sounds strange and the book doesn’t mention agile once – it is my interpretation of it.  It’s really about prioritizing daily and choosing the items that are most impactful to your company and yourself while deprioritizing everything else.  Additionally, I have adopted the more and more common practice of ignoring most e-mails and checking them less frequently throughout the day.  If it’s important, they will get back to you in a way that you cannot ignore.  Otherwise, it’s just getting in the way of the things you prioritized for the day.  If you start to follow this advice, I would add that you should also set up some alerts to ensure that certain people are never ignored.

An unhealthy obsession with gaming returned to my life in 2014.  However, I was successful in squashing it at the end of 2014 – well all of it except for the updates to candy crush I have to do when a new level comes out 🙂  Hopefully the squash will return some valuable time I need in order to blog a bit more and round myself in the wee hours of the night.

The 2015 Hitlist

New-Year-Baby-2

Golang

Use golang in an actual project.  I really like golang.  It feels like an interpretive language (PowerShell, Python, Perl), but it is compiled and has the potential to automagically make things more efficient as the engine matures.  I have played with golang a bit, but I want to find a project to use it with that will prove/disprove my initial thoughts about the language.

Openstack

Learn Openstack.  I’m sick of being in conversations where I cannot speak with authority about what open stack can and can’t do.  I need to understand all of the components and how they work.  This is pure lab time that I should have done last year.

Public cloud

Re-evaluate the cloud providers.  It’s been about two years since I last looked at AWS and Azure.  I’d like to get a handle on the current offerings and take a closer look at the Google compute engine stuff.

PowerShell Talks and Blogs

Put some new talks together about PowerShell 5.0 and the ZeroMQ library.  Perhaps finally blog or do a talk about the heuristic and deterministic algorithm implementations I have done with PowerShell.

Publish my unpublished article around running a PowerShell script as any user in AD without credentials 🙂 (it’s possible, but requires a configuration you will likely never want to do – but hey, from a research perspective, it’s fun to try).

Podcast

Revisit the cmdlet of the day podcast (no link because the storage is currently not working).  Of all of the things I have ever been involved with, this is the one that I get the most positive feedback from.  I have been thinking it would be fun to kick off the year giving 5-minute discussions about enterprise scripting best practices. There’s so much potential in the short-form podcast format for something highly technical.  I’d love to do this right and perhaps inspire others to pick up the torch in similar technologies that I could benefit from listening to.

The 2015 Watchlist

baby-with-glasses

 

PowerShell

PowerShell as a development language – I still firmly believe that PowerShell is one of the best languages ever written.  In my opinion it is a better interpreted development language than Python and Perl.  I would love to see it used the way it should be.  This is probably a losing battle as Microsoft’s focus is on making it feel like a development language strictly to get providers written for DSC, but I constantly hold my breath waiting for something more.  My hope is that the open sourcing of .NET along with the new language mode in 5.0 may open that door a bit more.  However, my face is slowly turning blue and I may not see the sweet release of air any time soon.  Additionally, I just don’t work on anything that would allow me to prove this outside of little pet projects here and there.  I suppose this is more of a crylist entry than a watchlist entry.

Microsoft open source

.NET being open sourced.  What does it mean?  What’s going to happen with it next?

Windows containers

Containers on Windows – What in the world does it all mean?  How will it manifest to IT shops, and how can I exploit it for the benefit of cheap, secure, and flexible compute where I work?

Checkpoint/Restore

In early 2013, I ran a successfull POC that leveraged CRIU to migrate an app including all of its state from one Linux server to another and have it start running again as if nothing happened.  Why is this not being exploited or am I missing projects that are leveraging it?  Either way, it’s still the most cutting-edge bit of magic out there.  I can’t wait to see where it goes.

Happy New Year!

Announcing PshOdata – Cmdlets that make Odata easy

I’m sitting in the PowerShell summit and just left the stage where I presented how the Odata extensions work and shared my new project (contributors welcome) to bring PowerShell-driven Odata endpoints to the masses

To make it clear just how easy it is to use, here is the code required to create a web service around get-process and stop-process.

$class = New-PshOdataClass Process -PK ID -Properties 'Name','ID'
$class |Set-PshOdataMethod -verb get -cmdlet get-process -Params Name, ID -FilterParams Name
$class |Set-PshOdataMethod -verb delete -cmdlet stop-process -FilterParams ID
$class |New-PshOdataEndpoint

This will generate three files in a folder called odata that can be copied up to your IIS server. After an IISReset, the Process endpoint will be available via the following URLs:

I hope you enjoy, and please post issues on GitHub if you encounter any.

Github Project

%d bloggers like this: