There are two lists

var proxy = new List<string>(); // 4 прокси var urls = new List<string>(); // 100 урлов 

I am using Parallel.For to implement multithread. The question now is that each proxy only makes 25 requests (25 * 4 = 100). As I see the best option is to divide the list into urls.Count/proxy.Count . However, it may be that if the collection is larger, it turns out that the same thread with one proxy can take data from the collection several times. How to be?

  Parallel.ForEach(Partitioner.Create(0, urls.Count, urls.Count / proxy.Count), new ParallelOptions { MaxDegreeOfParallelism = 10 }, range => { for (int i = range.Item1; i < range.Item2; i++) { .... } }); 
  • four
    And let you describe the entire task. The fact is that parallelization is intended for CPU-bound tasks, while the description contains proxies and URLs, which can mean IO-bound tasks. - Alexander Petrov
  • And why do you need exactly the same number of tasks? If you trust the automatic distribution, it is in theory not divided by an equal amount, but so as to reduce the total running time - isn't this what you need? Well, yes, it is not clear why you just do not use async / await and TPL, since your bottleneck is not a processor. - VladD
  • In C #, this is done by a one-liner: range.AsParallel().WithDegreeOfParallelism(4).ForAll(...) . Why do you need to share the collection exactly? - Raider

1 answer 1

As already written above, it’s not a good idea to use a parallel loop for your task.

I recently had a similar task when I had to make 110,000 queries and then parse the results. Sequential processing took 16 hours, at the same time five threads made it in ~ 2 hours 10 minutes.

I would advise you to create several Tasks (I did the 'Number of processor cores' - 1), divide the collection by the number of threads and pass everything to the Task delegate as a parameter.

Code like this:

 await Task.Run(() => StartHtmlReadingAsync(lines, maxThreads)); 

...

 private async Task<int> StartHtmlReadingAsync(string[] lines, int maxThreads) { string[][] splittedArray = lines.SplitTo(maxThreads - 1); Task<int>[] listOfActions = new Task<int>[splittedKnArray.Length]; for(int index = 0; index < splittedKnArray.Length; index++) { int idx = index; listOfActions[idx] = Task.Factory.StartNew(() => ReadFromSiteAsync(splittedKnArray[idx])).Unwrap(); } await Task.WhenAll(listOfActions) } 

...

 private async Task<int> ReadFromSiteAsync(string[] objs) { ... } 
  • And you also did not fully describe your task. 16 hours was the load on the processor or requests somewhere? If the bottleneck was waiting for answers, then you can create more tasks than there are cores. - Alexander Petrov
  • 16 hours went to the server and waiting for a response. Yes, it can be more, but the profit was no longer so big. - Lev Limin