Implementing parallel processing of lazy collections in PLINQ

Question

Suppose there is a lazy collection of type IEnumerable (no matter how it is obtained), which will be processed in parallel by the Select operator from PLINQ. For this, the so-called Chunk Partition will be created, which will be processed by the N-th number of worker threads (it is worth noting here that the collection is not indexable, so the Chunk partition will be used). Each such thread will handle chunk using an iterator of the ContiguousChunkLazyEnumerator type. Here is a snippet of the MoveNext code MoveNext the ContiguousChunkLazyEnumerator class (from the .Net source code):

 lock (m_sourceSyncLock) { // Some .net stuff try { for (; i < mutables.m_nextChunkMaxSize && m_source.MoveNext(); i++) { // Read the current entry into our buffer. chunkBuffer[i] = m_source.Current; } } // Some .net stuff }

As can be seen from the code, each such iterator works with the shared m_source object of type IEnumerator . I'm interested in the next question. How this approach can give a performance boost. After all, in essence, using a lock should completely kill performance (concurrency in essence will not provide benefits). And in theory, the same code executed sequentially by one thread and without blocking will have the same performance.

This implementation is not quite clear to me, maybe I did not take into account something? (I'm still inexperienced in multithreading, I will be grateful for the answers).

Accepted Answer · 2017-01-16T10:54:25

By itself, lock is not a problem for the performance of multi-threaded code. The problem is when the threads hang on the lock for a long time (so-called high contention). That is why the synchronized section of code should be executed as quickly as possible.

Everything that the above code section does under lock'om - fills in a new chunk ( chunkBuffer ) from the source and updates the internal indexes. This is a very fast code, so you will not get a strong competition in this section. Even if several streams collide in this section, each of them will be very short of waiting.

Upd

Yes, in the case of a lazy and lengthy computation of the sequence, the code under the lock can really run for a long time and you will get high competition in this section of the code. But besides the “extract” phase of the elements, PLINQ also does the “computation” phase of these elements. And this, as @kmv noted in the comments, is his main responsibility. PLINQ is primarily intended to parallelize some actions on elements. And in this phase, you still get a performance boost relative to the single-threaded version. Of course, if in your case the computation of the elements of a sequence is much harder than further actions, then the final increase will be small.

That's right, the interesting situation when the threads hang on the lock.
Indeed, the calculation of a collection item can be an expensive operation.
A simple example is when I have to fully compute a tree associated with this element when calculating an element of a collection
The synchronized code performs two MoveNext and Current operations.
@LmTinyToon retrieving elements from a sequence is not PLINQ’s responsibility.
His responsibility is to perform calculations on the (already extracted) elements of the sequence
@LmTinyToon the most important thing in the question and missed :).

Answer 2 · 2017-01-16T11:02:21

In short, PLINQ parallelizes actions on a sequence, not the calculation of the sequence itself.

In any case, the IEnumerable interface does not allow to parallelize itself by design, it is a fundamentally consistent interface. If your sequence elements have been calculated for too long, then you should not expect benefits from PLINQ.

It is necessary or to change the architecture, removing from it the very sequence with a long calculation - or to abandon the "uniform" parallelism in favor of the pipeline. In the second case, the Tpl Dataflow library or the BlockingCollection class will help you.

The idea of the pipeline is that the sequence calculates one stream, and that the operations on the elements are done on another or on others.

But the execution of the action on the sequence should affect the calculation of the elements of the sequence.
The calculation is in fact in the case of collections deferred.
@LmTinyToon yes, it is deferred - but this does not cease to be consistent.

Implementing parallel processing of lazy collections in PLINQ

2 answers 2

More articles: