In the process of reading-processing-saving , a bottleneck was found processing - converting html to pdf - this part takes the most time.
It seems it turned out to snatch a piece of processing-saving in a separate thread:
public class HtmlProcessor { private readonly string _htmlFolder; private readonly string _pdfFolder; public HtmlProcessor(string htmlFolder) { _htmlFolder = htmlFolder; //Create folder to store pdf-files _pdfFolder = Path.Combine(_htmlFolder, "pdfs"); if (!Directory.Exists(_pdfFolder)) { Directory.CreateDirectory(_pdfFolder); } } public void Process() { var htmlFileNames = Directory.GetFiles(_htmlFolder); foreach (var htmlFileName in htmlFileNames) { var htmlFileContent = File.ReadAllText(htmlFileName); Task.Run(() => { var htmlToPdf = new HtmlToPdf(); var pdfDocument = htmlToPdf.ConvertHtmlString(htmlFileContent); pdfDocument.Save($@"{_pdfFolder}\{Path.GetFileNameWithoutExtension(htmlFileName)}.pdf"); }); } } static void Main(string[] args) { var htmlProcessor = new HtmlProcessor("c:\htmlFilesFolder"); htmlProcessor.Process(); } QUESTION
Now I want to tear off the processing so that only this part is executed in parallel. I do not know how logic should look for this. I guess that, something like a stream in the stream will be.
So the question is: what does multithreading look like when you want to parallelize a task from the middle of a certain process?
TPL Dataflow. Now I read / understand ... - Adam