Briefly about what I tried to do in the code below. I wanted to make a large size csv database parser. I decided to use the Apache library and implement the following algorithm: 1 Starts reading the element by element from the base and stores it in the collection 2 When it reaches a certain size, the reading is suspended and the collection is checked with a regular schedule. 3 PB checks the lines for the presence of mail addresses and writes this into a text document. Further, in order to speed up the process, I decided to do a thread pool. I read that there is a Concurrent package that allows you to create a pool of threads using the ExecutorService interface. And the algorithm formulated the following: Everything starts as before, the collection is filled, then it is transferred to the worker in one of the pool threads, and there he checks it with a regular schedule. According to my plan, workers in parallel should receive and parse elements of collections parallel to each other, but everything happens consistently. If there is a stream that is busy processing a large array of data, then others do not start parallel to it. Here is what is displayed when compiling:
Worker 1 started Thread 1 stopped Worker 2 started Thread 2 stopped Worker 3 started On the 3rd worker, the thread takes a selection in which a long check of my base takes place and thinks brutally. In parallel, in theory, should start 4.5, etc. threads in the pool, but this does not happen. Not until the end, I apparently understood how to work with this package ... please tell me what can be the jamb.
import org.apache.commons.csv.CSVRecord; import java.io.FileReader; import java.io.Reader; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class TestThreadPool { static int executorInProgressCounter; public static void main(String[] args) throws Exception{ ExecutorService executor = Executors.newFixedThreadPool(10);//creating a pool of 5 threads String path = "DB adres.csv"; Reader in = new FileReader(path); Iterable<CSVRecord> records = CSVFormat.MYSQL.parse(in); List<CSVRecord> tableStr = new ArrayList<>(); for (CSVRecord record : records) { tableStr.add(record); if (tableStr.size() == 5 ) { Runnable worker = new WorkerThread(tableStr); executor.execute(worker); } } executor.shutdown(); } } Class "workers":
import org.apache.commons.csv.CSVRecord; import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.regex.Matcher; import java.util.regex.Pattern; class WorkerThread implements Runnable { static int docNum=1; static int workerCount=0; List<CSVRecord> lineList=new ArrayList<>(); static public final Pattern PATTERN_MAIL = Pattern.compile("([a-z0-9_-]+\\.)*[a-z0-9_-]+@[a-z0-9_-]+(\\.[a-z0-9_-]+)*\\.[az]{2,6}"); Matcher getMail; public WorkerThread(List<CSVRecord> list){ this.lineList=list; } public void run() { workerCount++; System.out.println("Worker "+workerCount+" started"); File file = new File("doc name " + docNum + ".text"); docNum++; try { FileWriter fileWriter = new FileWriter(file.getAbsoluteFile()); BufferedWriter bufWriter = new BufferedWriter(fileWriter); for (int i = 0; i < lineList.size(); i++) { getMail = PATTERN_MAIL.matcher(lineList.get(i).toString()); while (getMail.find()) { bufWriter.write(lineList.get(i).toString().substring(getMail.start(), getMail.end()) + ";"); } } lineList.clear(); bufWriter.close(); System.out.println("Thread "+ workerCount+" stopped"); } catch (IOException e) { System.out.println("Error occured in creating the file"); } } }