cellsData is an array of data (List) that you need to write to the .xslx file by means of OpenXML, since there is a lot of data, we decided to split the record in the worksheet using streams.
The WriteCellInTable method writes the sheetData part to the worksheet.

Data may be recorded several times in one and the same cell of the table, it is important that the latest data remain in the cell, and when working with two or more streams, when writing to one and the same cell, an overlay occurs and irrelevant data may remain in the cell. How to avoid this and is there such a decision in principle?

int prCount = Environment.ProcessorCount; Thread[] threads = new Thread[prCount - 1]; int part = cellsData.Count / prCount; int begin = 0, thrNum = 0; for (int i = 0; i < prCount; i++) { if (i == prCount - 1) { part = cellsData.Count - begin; WriteCellInTable(new Object[] { (Object)begin, (Object)part, (Object)sheetData }); //В текущем потоке break; } threads[thrNum] = new Thread(WriteCellInTable); threads[thrNum].Start(new Object[] { (Object)begin, (Object)part, (Object)sheetData }); //Новый поток begin += part; thrNum++; } Columns columns = new Columns(); InsertColumnWidth(columns); MergeCells mergeCells = new MergeCells(); SetMergeCell(mergeCells); for (int i = 0; i < thrNum; i++) threads[i].Join(); worksheet.Append(new SheetFormatProperties() { DefaultRowHeight = 15D, DyDescent = 0.25D }); 
  • Data "much" is how much? - klutch1991
  • It’s just interesting how many items you have in a collection, that you want to win a significant amount of time writing data into one excel file from several streams, and considering counting the file into which you write data as a shared resource (in the context of streams), then write the file must be placed in the critical section, which will not allow multiple threads to write to the file at the same time. In this situation, you will win only at the stage of converting these objects contained in the collection .. - klutch1991
  • @ klutch1991, well, let's say I fill 300k cells with data. And the code on 4 threads works almost 4 times faster. And if there are no overlays (there is no rewriting of the cells), then everything is fine, and if there are overlays, then everything is different here. - K. Kulahin
  • Overlays arise because they get access to the file at the same time. To work correctly with a shared resource (in your case, with an xls file), write to it in the lock () {} construct, then there will be no overlays. But you are unlikely to get a significant performance gain. - klutch1991

1 answer 1

If the WriteCellInTable method accesses the same object from different threads simultaneously without any synchronization, this is wrong. The results of measuring the performance of such a code should not be taken into account. Read more about synchronization in this article.

But let's say you alter the code by wrapping the contents of the WriteCellInTable into a lock construct. Then the conflicts will disappear, but the goal of increasing productivity will not be achieved. The fact is that performance gains by using multithreading can only be obtained if the threads do not compete for common resources: when accessing a shared resource, the threads will spend a lot of time waiting for the resource to be released, and as a result there is no point.

To really improve performance, you need to have each thread operate with only its limited set of cells (for example, one thread per sheet). Then you can expect some performance improvement.