The Parallel Extension for .NET (aka PFX) is a new API coming to .NET 3.5 (and included out-of-the-box with .NET 4.0) that greatly simplify and enhance the parallel-processing story in .NET.  Sure, we’ve got Thread and ThreadPool, but these classes are very lacking.  Fortunately, PFX is a huge improvement.  Be sure to check out their blog and the Parallel Computing Center on MSDN for more information.

One of the things I ran into last week was the need to calculate an average of a series of numbers.  Each number was computed by a long-running method, and since each number was independent, it was a good candidate for parallelization.  While parallelzing a set of operations is trivial with PFX, parallelizing an aggregation like this isn’t as straight forward.  There’s a "right" way to do it and a "wrong" way to do it.  The simplest answer is to use some sort of synchronization primitive in your parallelized code to safely increment the running sum variable, like this:

   1: Parallel.ForEach(someObjects, 
   2:                     theObject =>
   3:                         {
   4:                             int value = CalculateValue(theObject);
   5:                             Interlocked.Add(ref sum, value);
   6:                         });

However, this is going to cause a lot of unnecessary thread synchronization that’s going to harm performance.  A better (but more complicated way) is to have each thread keep its own running sum, then combine each thread’s output at the end into a global sum.  On this front, I found the documentation and (limited) samples to be a little less helpful than I would have liked (especially for such a trivial problem), but I managed to piece together the solution easily enough:

   1: Parallel.ForEach(someObjects, 
   2:                     () => 0,
   3:                     (theObject, index, localSum) =>
   4:                         {
   5:                             localSum += CalculateValue(theObject);
   6:                         },
   7:                         localSum => Interlocked.Add(ref sum ,localSum));

This is a little more complicated, so let’s walk through it: the first parameter is the set of objects to ForEach over in parallel.  The second parameter is a delegate that returns an object to use for each thread’s local state.  This can be any object you want, but in this case, since we’re calculating a sum, it’s just an int, initialized to zero.  Next is the actual code that we want to execute in parallel.  This time, it takes in three parameters instead of just one: the object to process, the index of the object in the enumerable (which this code isn’t using, but has to include since that’s how the delegate is defined), and the thread-local state, which in our case is the local counter (which we initialized to zero).  Again, we calculate a sum, but this time it’s just tacked on to our local sum.  Finally, we have another expression that takes the integer localSum as input.  This is invoked once per thread at the end of the thread’s work.  It takes the thread’s local sum and adds it to our global, shared sum.  Since this could be invoked on multiple threads at once, it still needs to be synchronized, but unlike the original parallel implementation, we’re only doing synchronization once per thread instead of for each item processed by the thread.  Pretty simple, but a bit verbose.  Something more like this would be preferable (and more intuitive) for trivial problems like this:

   1: Parallel.Sum(someObjects),
   2:                 theObject => CalculateValue(theObject),
   3:                 ref sum);

I thought about wrapping this up in an extension method, but you can’t define extension methods for static classes, so that’s out of the question. :(  Still, I’m very excited about the changes coming to .NET to better enable us developers to leverage modern-day multi-core systems.