Tome's Land of IT
IT Notes from the Powertoe – Tome Tanasovski
ForEach-Parallel
May 3, 2012
Posted by on I just came back from the PowerShell Deep Dive at TEC 2012. A great experience, by the way. I highly recommend it to everyone. Extremely smart and passionate people who could talk about PowerShell for days along with direct access to the PowerShell product team!
During this summit, workflows were a topic of conversation. If you have looked at workflows, there is one feature that generally catches the eye – I know it caught mine the first time I saw it – ForEach-Parallel. Unfortunately, when you dig into what it’s doing you come to learn that it is not a solution for multithreading in PowerShell. Nope, it’s extremely slowwwwwwwwwwwwwww. If you’re like me, parallel processing is key to getting some enterprise-class scripts to run faster. You may have played with jobs before, but even they have some overhead that causes them to slow down. Running scripts side by side works, but requires you to engineer the scripts in a way that they can be called like that. So what is the best way to run something like a loop of data across four threads? The answer is runspaces and runspace pooling.
function ForEach-Parallel { param( [Parameter(Mandatory=$true,position=0)] [System.Management.Automation.ScriptBlock] $ScriptBlock, [Parameter(Mandatory=$true,ValueFromPipeline=$true)] [PSObject]$InputObject, [Parameter(Mandatory=$false)] [int]$MaxThreads=5 ) BEGIN { $iss = [system.management.automation.runspaces.initialsessionstate]::CreateDefault() $pool = [Runspacefactory]::CreateRunspacePool(1, $maxthreads, $iss, $host) $pool.open() $threads = @() $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString()) } PROCESS { $powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject) $powershell.runspacepool=$pool $threads+= @{ instance = $powershell handle = $powershell.begininvoke() } } END { $notdone = $true while ($notdone) { $notdone = $false for ($i=0; $i -lt $threads.count; $i++) { $thread = $threads[$i] if ($thread) { if ($thread.handle.iscompleted) { $thread.instance.endinvoke($thread.handle) $thread.instance.dispose() $threads[$i] = $null } else { $notdone = $true } } } } } }
With that function, you can do things like this:
(0..50) |ForEach-Parallel -MaxThreads 4{ $_ sleep 3 }
You’ll notice that the above causes batches of four to run simultaneously. Actually, it looks like the data is running serially, but it’s really in parallel. A better example is something like this that simulates that some processes take longer than others:
(0..50) |ForEach-Parallel -MaxThreads 4{ $_ sleep (Get-Random -Minimum 0 -Maximum 5) }
Mind you, parallel processing doesn’t always make things faster. For example, if your CPU consumption per thread is more than your box can handle, you may be adding latency due to scheduling of the CPU. Another example is that if it’s not a long running process that you are performing in your loop, the overhead for starting up multiple threads could make your script slower. Just use your head and play with it. In the right place at the right time, this is an absolute lifesaver.
Note: I learned this technique from Dr. Tobias Weltner, but for some reason I can’t find the link to the video where he discussed it.
Dr Tobias Weltner video on multi threading etc
http://bits_video.s3.amazonaws.com/022012-SUPS01_archive.f4v
>> download files: http://powershell.com/cs/media/p/14779.aspx
Maybe the original links for the video from Powershell.com were removed?!
I think I am missing something. The script block populates the variables but after the scriptblock is finished the variables are no longer populated with data. I need the results of the variable for the next part of my script. What am I missing. $fqdnlist and $deadservers is blank after the script block is done.
I am very excited about this as this is much faster then start-job!
function ForEach-Parallel {
param(
[Parameter(Mandatory=$true,position=0)]
[System.Management.Automation.ScriptBlock] $ScriptBlock,
[Parameter(Mandatory=$true,ValueFromPipeline=$true)]
[PSObject]$InputObject,
[Parameter(Mandatory=$false)]
[int]$MaxThreads=5
)
BEGIN {
$iss = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$pool = [Runspacefactory]::CreateRunspacePool(1, $maxthreads, $iss, $host)
$pool.open()
$threads = @()
$ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock(“param(`$_)`r`n” + $Scriptblock.ToString())
}
PROCESS {
$powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject)
$powershell.runspacepool=$pool
$threads+= @{
instance = $powershell
handle = $powershell.begininvoke()
}
}
END {
$notdone = $true
while ($notdone) {
$notdone = $false
for ($i=0; $i -lt $threads.count; $i++) {
$thread = $threads[$i]
if ($thread) {
if ($thread.handle.iscompleted) {
$thread.instance.endinvoke($thread.handle)
$thread.instance.dispose()
$threads[$i] = $null
}
else {
$notdone = $true
}
}
}
}
}
}
$erroractionpreference = “SilentlyContinue”
$colComputers = get-content C:\temp\listserver.txt
$Fqdnlist = @()
$deadservers = @()
$code = {
$results += $var = nltest /domain_trusts; $var = $var -split ” “; $var = $var | ? {$_ -like “*.net” -or $_ -like “*.com” -or $_ -like “*.pvt”}
foreach ($domain in $var)
{
$ping = new-object System.Net.NetworkInformation.Ping
$fqdn = “$_” + “.” + “$domain”
$Reply = $ping.send($fqdn)
if ($Reply.status –eq “Success”)
{
Write-host -ForegroundColor Green (“$_” + “.” + “$domain”)
$Fqdnlist += (“$_” + “.” + “$domain” + “,”)
}
else
{
$deadservers += (“$_”)
}
$reply = “”
}
}
($colcomputers) | foreach-parallel -ScriptBlock $code -MaxThreads 20
Yes, the variables within the scriptblock will only remain in scope for the duration of the scriptblock. You would need to send the objects you want to return within the scriptblock on a line of their own to return them. You would then need to figure out how to handle multiple objects returned.
You can control this greatly, however, by managing the runspaces yourself. In other words, if you don’t use runspacepools, you can reuse your runspaces and maintain the state of variables. However, this will only be in the scope of your runspaces, but you have more access to them.
Pingback: Parallel PowerShell | rambling cookie monster
Hello Tome,
This is coming in quite handy, many thanks!
Have you had to modify the initial session state to add a module, rather than loading the module for every single thread?
I’m assuming we create the default, and then use ImportPSModule, but I’m having trouble translating the developer oriented details on this method to PowerShell!
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$sessionstate.ImportPSModule(“ActiveDirectory”) #doesn’t seem to work
Regards,
CM
Pingback: More on PowerShell multithreading via runspace pools | Dave Wyatt's Blog
Hi Tome!
I like to invite you to my discussion: Invoke-Parallel need help to clone the current Runspace
http://powershell.org/wp/forums/topic/invpke-parallel-need-help-to-clone-the-current-runspace/
Greets Peter Kriegel
Founder member of the European, German speaking, Windows PowerShell Community
http://www.PowerShell-Group.eu