### Top Posts

### Categories

- Looking forward to tonight! twitter.com/sunnyc7/status… 2 weeks ago

IT Notes from the Powertoe – Tome Tanasovski

May 21, 2012

Posted by on I’m going to take a moment to stray from the world of PowerShell to bring us down to some mathematics. I had a lot of trouble finding an exact solution to my problem, so I thought I would share it..

I have two arrays:

$i=@('a','b','c') $j=@('1','2','3')

How many unique pairs can I create from these two data sets. Furthermore, what if the count of items in set $i was greater than the count of items in set $j.

$i=@('a','b','c','d')

I’m not looking to create the algorithm to generate these pairs. I’m just looking to understand how many possible combinations there are in order to understand whether or not I need to explore a matching optimization technique like the genetic algorithm or whether or not I can apply brute force to the problem by trying all combinations of pairs.

I should note that these are obviously hypothetical data sets for the purpose of bringing the problem down to its root. The real data sets are objects that when compared with each other produce an integer value that provides a relative score of how compatible the pair is. You can think of this as a match.com-like problem. The real-world problem also has larger data sets of 25 and 20 objects.

The first problem seems fairly straightforward. The resulting pairs look like this:

a b c a1 b2 c3 a1 b3 c2 a2 b3 c1 a2 b1 c3 a3 b1 c2 a3 b2 c2

There are six possible solutions. As you can see, I leave the $i array in a static order and then try every permutation of $j to generate the unique pairs. If I rearrange $i, it won’t matter because it will not create unique pairs. Therefore the answer to problem #1 is J!. If your rusty on your math symbols, that’s the factorial of j that will account for every possible permutation. For example, $j.count equals 3. Therefore, the total number of possible solutions is 3x2x1 or 6.

This is the tricky one. The answer involves getting all of the possible ways of creating $j.count letters out of $i. For example, the grid from solution #1 is valid:

a b c a1 b2 c3 a1 b3 c2 a2 b3 c1 a2 b1 c3 a3 b1 c2 a3 b2 c2

However, we also know that we can do all of the above permutations of $j with:

a b c a b d a c d b c d

That means I have the 6 possible combinations of $j to try with the four combinations of $i. I can see the 24 possible combinations, but how do I figure that mathematically.

So the question is how can I figure out that $i would have only 4 possible ways of cutting it if I was looking for sets of 3 (the total number of $j.count). In other words, how many sets of a specific size I can create out of any set. This of course has been figured out. I mean, people have been playing with the possible card combinations forever. This is a typical dealing problem. How many combinations of 5 cards can be delivered out of a deck of 52 cards. The solution is the binomial coefficient.

To summarize:

n!/k!(n-k)!

where n is the total number of items in the set and k is the count for the groupings. This means that in my problem where I had 4 letters in $j, and I want to see how many ways I can match them up with my 3 different numbers, I can get the answer with the following:

4!/3! = 4x3x2x1/3x2x1x1 = 24/6 = 4

Now if I multiply this number by the total permutations of $j, I get the total number of combinations:

4x3! = 24

The final formula to determine all of the possible unique pairs can be reduced down as follows. It assumes that n > k.

(n!/k!(n-k)!) * k! n!k!/k!(n-k)! n!/(n-k)!

In the case where n -eq k, then the answer is just n!

Fortunately, this helped me come up with an equation to understand how many possible combinations I would need to test in my script. I happened to have 30 in one set and 29 in the other. That would mean 30! or 265,252,859,812,191,058,636,308,480,000,000 or 265 nonillion possible combinations. That’s definitely way too many to try all of them. Case closed. The genetic algorithm will help me greatly find the best possible combinations without having to try them all. We’ll save that discussion for another day.

Advertisements

%d bloggers like this:

for the record, the genetic algorithm was a horrible solution for this. It takes forever, it’s heuristical, and does just an ok job after throwing real-world data at it. In the end, the munkres (hungarian) algorithm was the best choice: it is deterministic and takes a couple of hours to run.