Good afternoon, colleagues.
I wrote a working algorithm for receiving and processing data in the typical producer / consumer scenario, I want to move further.
One of the producer-procedures receives a list of files and starts Parallel.ForEach for each element of the list. Each iteration consists of three blocks:
- File download
- Reading a file through Excel's COM interface and getting a two-dimensional array of strings
- Creating an object for each row in the array and sending it to the
BlockingCollection
There are several hundred files, it is clear that it is pointless and expensive to launch an Excel instance to read each file, so point 2 is enclosed in a critical section. You can, of course, use a semaphore and process files with multiple instances of Excel, but that’s another story and I don’t want to touch on that.
In the current state, the cycle keeps 4 tasks active (by the number of processors), that is, parallelism turns out to be ineffective: 4 files are quickly downloaded, the tasks wait in turn for locking and the algorithm is almost synchronous.
Question: how to put the task of the first iteration of Parallel.ForEach into standby mode to start working the second one, and then return and finish the first one? Trying to use Await, the execution thread goes out of the loop and I get porridge.
The effective result would be something like this: 4 files were downloaded, the blocking started on Excel, the other three tasks went into the background, three files were downloaded, three files were downloaded, an array from the 1st task was processed, the blocking started in the 2nd task, 2 files were downloaded ...
I would also like to try to abandon Parallel.ForEach , split the algorithm into three synchronous For Each and bind them through 2 consume-коллекции to provide approximately the kind of implementation described above. Or even write three functions and link them directly through Yield without any extra collections, it will be even faster. But this is also another story that I will not touch on in this question.
In this case, I don’t have enough IQ to deal with the asynchrony issue inside the Parallel.ForEach iterations myself, I really hope for your expert advice that will be able to raise my level.
thank