Briefly about what I tried to do in the code below. I wanted to make a large size csv database parser. I decided to use the Apache library and implement the following algorithm: 1 Starts reading the element by element from the base and stores it in the collection 2 When it reaches a certain size, the reading is suspended and the collection is checked with a regular schedule. 3 PB checks the lines for the presence of mail addresses and writes this into a text document. Further, in order to speed up the process, I decided to do a thread pool. I read that there is a Concurrent package that allows you to create a pool of threads using the ExecutorService interface. And the algorithm formulated the following: Everything starts as before, the collection is filled, then it is transferred to the worker in one of the pool threads, and there he checks it with a regular schedule. According to my plan, workers in parallel should receive and parse elements of collections parallel to each other, but everything happens consistently. If there is a stream that is busy processing a large array of data, then others do not start parallel to it. Here is what is displayed when compiling:

Worker 1 started Thread 1 stopped Worker 2 started Thread 2 stopped Worker 3 started 

On the 3rd worker, the thread takes a selection in which a long check of my base takes place and thinks brutally. In parallel, in theory, should start 4.5, etc. threads in the pool, but this does not happen. Not until the end, I apparently understood how to work with this package ... please tell me what can be the jamb.

 import org.apache.commons.csv.CSVRecord; import java.io.FileReader; import java.io.Reader; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class TestThreadPool { static int executorInProgressCounter; public static void main(String[] args) throws Exception{ ExecutorService executor = Executors.newFixedThreadPool(10);//creating a pool of 5 threads String path = "DB adres.csv"; Reader in = new FileReader(path); Iterable<CSVRecord> records = CSVFormat.MYSQL.parse(in); List<CSVRecord> tableStr = new ArrayList<>(); for (CSVRecord record : records) { tableStr.add(record); if (tableStr.size() == 5 ) { Runnable worker = new WorkerThread(tableStr); executor.execute(worker); } } executor.shutdown(); } } 

Class "workers":

 import org.apache.commons.csv.CSVRecord; import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.regex.Matcher; import java.util.regex.Pattern; class WorkerThread implements Runnable { static int docNum=1; static int workerCount=0; List<CSVRecord> lineList=new ArrayList<>(); static public final Pattern PATTERN_MAIL = Pattern.compile("([a-z0-9_-]+\\.)*[a-z0-9_-]+@[a-z0-9_-]+(\\.[a-z0-9_-]+)*\\.[az]{2,6}"); Matcher getMail; public WorkerThread(List<CSVRecord> list){ this.lineList=list; } public void run() { workerCount++; System.out.println("Worker "+workerCount+" started"); File file = new File("doc name " + docNum + ".text"); docNum++; try { FileWriter fileWriter = new FileWriter(file.getAbsoluteFile()); BufferedWriter bufWriter = new BufferedWriter(fileWriter); for (int i = 0; i < lineList.size(); i++) { getMail = PATTERN_MAIL.matcher(lineList.get(i).toString()); while (getMail.find()) { bufWriter.write(lineList.get(i).toString().substring(getMail.start(), getMail.end()) + ";"); } } lineList.clear(); bufWriter.close(); System.out.println("Thread "+ workerCount+" stopped"); } catch (IOException e) { System.out.println("Error occured in creating the file"); } } } 

    1 answer 1

    Design

     if (tableStr.size() == 5 ) { 

    it will be executed only once, because the collection will continue to swell (as three vorkers appeared there - I honestly do not understand). Besides,

     executor.shutdown(); 

    it will be called immediately after all the records have been read (but not necessarily processed), and the same collection will be transferred to different workers, and the workers will do the same work

    In general, you'd better use queues to transfer data between threads.

    • This is what an outside view means. about the collection you were right, I did not foresee this moment. Put the tableStr.clear (); after executor.execute (worker); and things got off the ground, but there is still something to work on. You wrote that one and the same collection would be handed over to all workers. Why? In the same place there will be a cleaning of a collection after transfer. And about turns. I understand you meant the collection Queue? What is its advantage? - Iga
    • one
      You wrote that one and the same collection will be transferred to all workers. Why? In the same place there will be a collection cleaning after transfer - because you transfer the same object. In a multithreaded environment, the order of actions is not guaranteed, and in practice, some threads will see the same thing. And about the queues. I understand you meant the collection Queue? What is its advantage? - In that, instead of sending collections of items, you can transfer items one by one, and instead of manually breaking the load, you can provide this to the queue. - etki
    • Understood what you are driving at. A good idea. I looked and found a cool thing in concurrent- tutorials.jenkov.com/java-util-concurrent/blockingqueue.html probably try to think of something with it. Thank you very much for the recommendations. You helped me - Iga