Effective use of Tpl.Dataflow

Question

The task is as follows: upon arrival of the track (there may be thousands of them), it is necessary to request data from the scrobbler, process the response (in the example not shown for the sake of clarity), and write to the file. I decided to use Tpl.Dataflow and this is what happened:

 static void Main() { HttpClient hc = new HttpClient(); StreamWriter sw = new StreamWriter(@"C:\res.txt"); // первый вариант TransformBlock<string, string> tb = new TransformBlock<string, string>(item => hc.GetStringAsync(item), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, BoundedCapacity = 200 }); //второй вариант //TransformBlock<string, string> tb = new TransformBlock<string, string>(item => new HttpClient().GetStringAsync(item), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, BoundedCapacity = 200 }); ActionBlock<string> ab = new ActionBlock<string>(item => { sw.WriteLine(item); sw.WriteLine("________________________________________________"); }, new ExecutionDataflowBlockOptions { BoundedCapacity = 200 }); tb.LinkTo(ab); tb.Completion.ContinueWith(item => ab.Complete()); ab.Completion.ContinueWith(item => sw.Dispose()); Stopwatch swa = new Stopwatch(); swa.Start(); foreach (var item in urls) { tb.Post(item); } tb.Complete(); ab.Completion.Wait(); Console.WriteLine(swa.ElapsedMilliseconds); }

As you probably already noticed, there are two branches of the solution:

In the first variant, I use the same HttpClient object, but it can simultaneously send only two requests (by the way, why is that?), HttpClient why the whole process takes quite a while
The second option is already faster, and much faster, if you parallelize not on 4, but on 20, for example, but also not without drawbacks: the creation of each HttpClient object entails setting up a connection (handshake), which is logical. But as I understand it, this is an extra overhead that can be avoided. What the question is about for me:

is it possible to somehow bind one HttpClient object to a task so that when it executes one request, the second request occurs through the same HttpClient object and the connection is not established. That is, I want the HttpClient objects HttpClient be as much as MaxDegreeOfParallelism and each task to use an HttpClient object that is not occupied by other tasks at the moment. Well, or another effective solution

Thank you in advance

Accepted Answer · 2016-12-21T16:17:16

Create a pool of HttpClient objects. Before the request, remove the object from the pool (or create a new one), and after the request, return it to the pool.

 ConcurrentBag<HttpClient> pool = new ConcurrentBag<HttpClient>(); TransformBlock<string, string> tb = new TransformBlock<string, string>(async item => { HttpClient hc; if(!pool.TryTake(out hc)) { hc=new HttpClient(); } try { return await hc.GetStringAsync(item).ConfigureAwait(false); } finally { pool.Add(hc); } }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, BoundedCapacity = 200 });

@Qutrix The code after await ( pool.Add(hc) ) is not tied to the synchronization context, so it makes no sense to capture and synchronize the synchronization context, as it happens by default.

Pavel Mayorov Pavel Mayorov 49k 5 gold signs 51 silver badge 110 bronze marks · Answer 2 · 2016-12-21T15:58:27

Create four TransformBlock instead of one. And each give your HttpClient.

Use another block as source.

Pavel Mayorov

49k 5 gold signs 51 silver badge 110 bronze marks

Firstly, I have already parallelized them (MaxDegreeOfParallelism = 4), secondly, it’s more difficult to connect, it would seem, thirdly, 4 for the sake of example, in fact it’s good to be at least 30, fourthly, I need to maintain order, but in this case it is not guaranteed - Qutrix
@Qutrix what's the difference - 4 or 30? The order is more complicated .. - Pavel Mayorov
have not yet encountered the case of "one source - many targets", to which target will the message be sent? everybody? alone? if one, then which one? - Qutrix
@Qutrix one. To some unknown (but certainly not to the one that is filled). - Pavel Mayorov

|

Effective use of Tpl.Dataflow

2 answers 2

More articles: