There is a code:

folderEntries.forEach { entry -> entry.listFiles().filter { it.isFile }.forEach { files.add(it) } } files.subList((files.size * threads / threadCount), (files.size * (threads + 1) / threadCount)).forEach { input.addAll(input.lastIndex + 1, Files.readAllLines(it.toPath())) } for (str in input) { buf = str.split(" ") outBuf.add("y = ${(Math.atan((buf[2].toDouble() / 4)) - buf[3].toInt() * 62) / (buf[0].toInt() * buf[0].toInt() - buf[1].toInt())}") } 

Which works slower than we would like. I would be grateful for the help in optimization.

    2 answers 2

    This is quite a serious question, since the answer depends on the OS, hardware, and so on. However, there are some general recommendations that can help:

    Input-Output operations are best done:

    • Asynchronously. This reduces the number of context switches (and Wait results to it when waiting for the OS to respond)
    • With a certain number of threads (when reading in parallel a large amount of data, the disk itself cannot cope, that is, we again spend extra time switching contexts, synchronization, etc., but now on the side of the disk controller)
    • Streaming (using Streaming), that is, avoiding File.lines, etc. The main reason - 10 files on 1 Gigabyte can eat about 10 GB. If the files are parsed on the go, then the required amount of memory is reduced. Moreover, a large array (or a string, it doesn’t matter) is written immediately to the latest generation (to avoid memory movement), which negatively affects both memory consumption and performance.

    In short: use the answer above, but slightly modified (there is no point in processing the lines themselves in a separate parallel block). Moreover, concurrency must be limited, tests for hardware are already needed.

      File("/tmp") .walkTopDown() .asSequence() .asStream() .parallel(N) .map { Files.lines(it.toPath()).split(" ").stream() } .flatMap { it } 

    In detail: you need to apply all the points above. In fact, you will get a study on working with IO in Java, similar to this for .Net

    • Thank you so much for the useful information - Andrey

    This suggests a solution with parallel streams such as

     File("/tmp") .walkTopDown() .asSequence() .asStream() .parallel() .map { Files.lines(it.toPath()) } .flatMap { it } .parallel() .map { it.split(" ").stream() } .flatMap { it }