Tome's Land of IT

IT Notes from the Powertoe – Tome Tanasovski

Category Archives: Statistics

Median and Mode in a Measure-Object Proxy Function or How to Add Properties to the Return Object in a Proxy Function

I was poking around Khan Academy for something to do.  Because I’ve been living and breathing data, I thought it only appropriate to run through the statistics lessons up there.  While, I’m no slouch at statistics, I figured it can’t hurt to listen to lesson 1: Mean, Median, and Mode.  I started to think about how to do perform these calculations in PowerShell.  Mean required no thought at all:

Mean

$data = (0,1,1,3,5)
($data |Measure-Object -Average).Average

Median and Mode required some thought.  I quickly mocked up the following which worked like a charm:

Median

$data = (0,1,1,3,5)
$data = $data |sort
if ($data.count%2) {
    #odd
    $medianvalue = $data[[math]::Floor($data.count/2)]
}
else {
    #even
    $MedianValue = ($data[$data.Count/2],$data[$data.count/2-1] |measure -Average).average
}    
$MedianValue

Mode

$data = (0,1,1,3,5)

$i=0
$modevalue = @()
foreach ($group in ($data |group |sort -Descending count)) {
    if ($group.count -ge $i) {
        $i = $group.count
        $modevalue += $group.Name
    }
    else {
        break
    }
}
$modevalue

This is all fine and dandy, but working on this made me think of a great talk that Kirk Munro did at TEC 2012 about proxy functions. This is a topic that I’ve been dying to play with, but I have not had the desire beyond curiosity. This, however, was a perfect occasion. I decided to extend Measure-Object to include a -Median and a -Mode parameter.

I’m not going to dig into how to do proxy functions. If you’d like a step-by-step guide, I’d suggest reading Shay Levy’s blog post on Hey Scripting Guy! It’s really the best there is on the subject. However, after you read that article, you will likely scratch your head as I did when thinking about how to perform your own calculation on the objects in the pipeline, and then modify the return object to include new properties.

Perform your own calculation on the objects in the pipeline within the proxy function

The first problem to solve was easy in my opinion. I wanted to collect all of the objects passed to the Process block, and then do my calculations on this acquired list in the End block. I initialize an array called $data in Begin. Within the Process block, you can simply add $_ to that $data list.  Actually, in the case of Measure-Object you need to also be mindful of whether someone used the Property parameter. If they do, you need to ensure that you are collecting the values of the property specified for the objects in the pipeline rather than the object itself.  Here is the relevant snippits with elipses (…) indicating the missing code. You will be able to see the full code at the end of this article:

begin
    {
        try {
            # Initialize my $data array
            $data = @()
...

   process
    {
        try {
            if ($Property) {
               $data += $_.($property)
            } else {
               $data += $_
            }
...

With the above code, you can now access $data in the End block. However, this is not enough. In order for my proxy function to feel like a single function it needs to return the data along with object that the function normally returns.

Modify the return object of the original function

At first glance it looks like you could call Add-Member on $steppablePipeline.End(). This will not work. The End() method does not actually return anything at all. I think it’s a bit counter-intuitive. Unfortunately, the only way I have found to solve this problem is to call the original function on the data, and then call Add-Member on the return value of the function. Shay points out a subtle hint in his article to this, by telling us that we must use the full namespace\cmdletname to the function in order to call the original function (the non-proxied version). The only thing you need to be careful about is that you properly call the function with the original parameters.  This can be done by using $pscmdlet.MyInvocation.BoundParameters, but you need to be sure to exclude the InputObject and the Property parameter.   The InputObject should be taken from the $data variable you have populated. The property parameter needs to be excluded because you have already flattened the data down to the value of the property in the Process block as described in the previous section.  The following code illustrates how all of this can be accomplished in your end block:

$params = @{}
foreach ($key in ($pscmdlet.MyInvocation.BoundParameters.Keys |?{($_ -ne 'inputobject') -and ($_ -ne 'Property')})) {
     $params.($key) = $pscmdlet.MyInvocation.BoundParameters.($key)
}
$return = $data |Microsoft.PowerShell.Utility\Measure-Object @params
$return |add-member noteproperty -Name SomeName -Value SomeValue
$return

Here is the final version of my code that extends Measure-Object to include -Median and -Mode. The only decision I made that makes it feel not a part of the original function is that I do not add the Median and Mode properties to the return object unless the respective parameters are specified.  I have consciously done this in order to avoid any negative performance impact if I do not use the Median or Mode switch parameters.  It’s also debatable whether the Measure-Object cmdlet should return one of its normal properties if the parameter switch for that property was not used, but that’s not something I’m here to debate.

function Measure-Object {
    [CmdletBinding(DefaultParameterSetName='GenericMeasure', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113349', RemotingCapability='None')]
    param(
        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        ${Average},

        [Parameter(ValueFromPipeline=$true)]
        [psobject]
        ${InputObject},

        [Parameter(Position=0)]
        [ValidateNotNullOrEmpty()]
        [string[]]
        ${Property},

        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        ${Sum},

        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        ${Maximum},

        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        ${Minimum},

        # Add my two parameters
        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        $Mode,

        [Parameter(ParameterSetName='GenericMeasure')]
        [switch]
        $Median,
        # Parameters added

        [Parameter(ParameterSetName='TextMeasure')]
        [switch]
        ${Line},

        [Parameter(ParameterSetName='TextMeasure')]
        [switch]
        ${Word},

        [Parameter(ParameterSetName='TextMeasure')]
        [switch]
        ${Character},

        [Parameter(ParameterSetName='TextMeasure')]
        [switch]
        ${IgnoreWhiteSpace})

    begin
    {
        try {
            # Initialize my $data array
            $data = @()
            # $data array initialized

            $outBuffer = $null
            if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
            {
                $PSBoundParameters['OutBuffer'] = 1
            }
            $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Measure-Object', [System.Management.Automation.CommandTypes]::Cmdlet)

            # Remove my parameters if they are used so that errors are not thrown when passed to the Measure-Object function
            if ($PSBoundParameters['Mode']) {
                $PSBoundParameters.Remove('Mode') |Out-Null            
            }

            if ($PSBoundparameters['Median']) {
                $PSBoundParameters.Remove('Median') |Out-Null            
            }
            #Parameters removed

            $scriptCmd = {& $wrappedCmd @PSBoundParameters }
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        } catch {
            throw
        }
    }

    process
    {
        try {
            # If one of my parameters is used, populate $data with the objects        
            if ($Median -or $Mode) {
                if ($Property) {
                    # The next line ensures that I'm populating the array with the values I should be measuring
                    # if the -Property parameter is used
                    $data += $_.($property)
                } else {
                    $data += $_
                }
            }
            # $data populated
            else {
                $steppablePipeline.Process($_)
            }
        } catch {
            throw
        }
    }

    end
    {
        try {
            # If my parameters are used, calculate and add the property to the return
            if ($Median -or $Mode) {
                # Grab all of the parameters except for InputObject
                $params = @{}
                foreach ($key in ($pscmdlet.MyInvocation.BoundParameters.Keys |?{($_ -ne 'inputobject') -and ($_ -ne 'Property')})) {
                    $params.($key) = $pscmdlet.MyInvocation.BoundParameters.($key)
                }
                # Call the original Measure-Object on the data so that I can add-Member my
                # properties to this later
                $return = $data |Microsoft.PowerShell.Utility\Measure-Object @params
                if ($Median) {
                    $data = $data |sort
                    if ($data.count%2) {
                        #odd
                        $medianvalue = $data[[math]::Floor($data.count/2)]
                    }
                    else {
                        #even
                        $MedianValue = ($data[$data.Count/2],$data[$data.count/2-1] |measure -Average).average
                    }    
                    $return |Add-Member Noteproperty -Name Median -Value $MedianValue
                }
                if ($Mode) {
                    $i=0
                    $modevalue = @()
                    foreach ($group in ($data |group |sort -Descending count)) {
                        if ($group.count -ge $i) {
                            $i = $group.count
                            $modevalue += $group.Name
                        }
                        else {
                            break
                        }
                    }
                    if ($modevalue.Count -gt 1) {
                        $return |Add-Member Noteproperty -Name Mode -Value $modevalue
                    } else {
                        $return |Add-Member Noteproperty -Name Mode -Value $modevalue[0]
                    }
                }
                $return
            }
            else {
                $steppablePipeline.End()
            }
        } catch {
            throw
        }
    }
    <#     .ForwardHelpTargetName Measure-Object     .ForwardHelpCategory Cmdlet     #>

}

Next Steps

The only thing remaining is to consider whether or not I should even used $wrappedcmd at all. Part of me thinks it might be best to drop it completely and create a function that just processes InputObject so that I can build it into a collection to be used later. Part of me says this is not worth thinking about right now. The latter has won. Good night.

%d bloggers like this: