In posts, such as: The Future of the Parallelism and its Challenges I mentioned that synchronization the access to the shared resource is the major challenge to write parallel code.
The synchronization and coordination take long time from the overall execution time, which reduce the benefits of the parallelism; the synchronization and coordination also reduce the scalability.
There are a few forms of synchronization and coordination, such as:
But there are a few techniques to avoid these issues, such as: Shared-Nothing, Actor Model, and Hyper Object (AKA Combinable Object). Simply if we reduce the shared data by re-architect our code this will gives us a huge benefits in performance and scalability.
Shared-Nothing and Actor Model
Personally I ‘m fan of these concepts; Shared-Nothing is architecture in which each node (i.e. execution thread) is independent and self-sufficient, and there is no single point of contention across the system. The principle behind Shared-Nothing is make every execution thread independent from the other threads, which improve the scalability and the performance. Also the pure Shared-Nothing can scale almost infinitely simply by adding nodes in the form.
Processing Huge File - Case
Let’s assume that we want process huge file (xx Gigabyte file) in forward-only fashion. For example: count word frequency, indexing, etc. So our parallelism implementation could be:
Load chunk of data, and process it in parallel. Like this pseudo Cilk C++ code:
While (! Stream.EndOfFile())
{
data = Stream.LoadChunk();
for (i = 0; I < data.Length; ++i)
spawn processLine(data[i])
}
I used Cilk C++ because it’s very simple, however you can use your parallelism framework.
This way to processing huge file is ineffective because:
Actually there are a few to methods to ignore the above issues, such as: Shared-Nothing Architecture, and Actor Model. So let’s fix it…
We will re-architect our code to make every piece independent. So we will have components such as:
The File Reader and every Chunk Processor will work independent. At the start you will initialize the Buffer (Fixed size concurrent queue) and File Reader in another thread, then initialize the workers everyone in different threads and map it to physical thread. For just start a new Task into your parallelism framework. The worker should workers in parallel try to get chunk from the buffer and process it.
What the different? This method will remove the direct dependence between the components, so everyone can scale independently. Now you can reduce the chunk size to reduce the waiting time and because every worker already working and there is no parallel overhead at all attached into the main thread after reading the data.
Combinable Object (A.K.A. Hyper Object)
Combinable Object or as Cilk like to call it ‘Hyper Object’ it’s a technique to avoid the synchronization and it’s problems by create combinable object per thread that at end of the task could combined to give you the final object or result. With Combinable object every thread access it is own copy, which means this is no need to synchronization and this copy will be cached effectively into the processors cache. See the following C sharp with Parallel Runtime Library simple code:
Combinable<List<string» CO = new Combinable<List<string»();
Parallel.For(0, ItemCount, i =>
{
CO.Local.Add(i.ToString());
});
List<string> result = CO.Combine((r, temp)=> {r.AddRange(temp); });
The List<> object itself not thread-safe, but the Combinable object solve the problem by create different List<> object per thread. And at the end you will be able to combine the objects together. But access the combinable objects itself require internal synchronization
However most of the Parallel frameworks today implement Prepare (i.e. local initialize) technique to maximize the benefits of the Combinable object. See the following C sharp with Parallel Runtime Library simple code:
Combinable<List<string» CO = new Combinable<List<string»();
Parallel.For(0, ItemCount, () => CO.Local, (i, local) =>
{
local.Add(i.ToString());
});
List<string> result = CO.Combine((r, temp)=> {r.AddRange(temp); });
It’s almost the same except we prepare and cache the thread local copy of the List<> object before the loop body, which reduce Local call to one time only.
In general Combinable Object, Shared-Nothing, and Actor model can improve you performance and scalability, but you should
— Haytham ElFdaeel - July 2009