You need to write a small application that will handle a huge number of files (40-50 thousand)

To improve performance, multithreading is needed, but how should it be implemented? 40 thousand threads is a bad option.

  • use the thread pool - free_ze
  • 3
    The general rule is that there should be no more threads than cores / processors - this time (otherwise switching will eat all the benefits ... exception - when threads often expect something), and second, if all threads are actively working with the disk, I'm afraid the benefits of multithreading may not be at all - each thread will pull the disk in its own direction. The answer, in fact, is one - experiment, profiling. - Harry
  • @Harry turns out that you need to use 2-4 streams, which, after processing a single file, will request the path to the next one? - Vitali
  • The number of streams can strongly depend on the current iron. For example, ssd or hdd. Therefore, as written above - just experiment with the number of threads and measure the speed. Yes, it is quite expensive to create 40k streams (on 32bit systems you will most likely end up somewhere between 600-700 streams, the axis simply will not allow more). - KoVadim
  • one
    In modern operating systems, caching disk data can significantly reduce the time of the program when it is restarted. Thus, the effect of the actual parallelization will be visible during the second and subsequent launches, and the first time the program is called, the disk is likely to mask the increase in speed. In any case, conduct independent experiments on a specific system. Decide on the technology you intend to use - POSIX Threads, Qt or Boost, OpenMP / TBB, and others like them, other means specific to languages ​​other than C ++. - iksemyonov

1 answer 1

The general rule is that there should be no more threads than cores / processors - this time (otherwise switching will eat all the benefits ... exception - when threads often expect something), and second, if all threads are actively working with the disk, I'm afraid the benefits of multithreading may not be at all - each thread will pull the disk in its own direction. The answer, in fact, is one - experiment, profiling.