I am writing a program to count words in a text file. The problem is that, when the program is executed, it eventually displays 0 unique words (or a number not exceeding 5,000 thousand). I can not understand what the problem is. I took War and Peace as a text file (to enlarge characters, I copied it 5 times).

Code:

static int i =0; static Map<String, Integer> mainMap = new ConcurrentHashMap<String, Integer>(); public static void main(String[] args) throws Exception { long startTime = System.currentTimeMillis(); String PathToFile = "D:\\WarAndPeace1.txt"; ArrayList<String> wordsTemp = new ArrayList<String>(); try { File file = new File(PathToFile); Scanner scanner = new Scanner(file); while (scanner.hasNext()) { i++; if (wordsTemp.size()<=500000) { String FinalWord = convertedWord(scanner.next().toLowerCase()); if (!FinalWord.equals("")||FinalWord!=null) wordsTemp.add(FinalWord); } else{ Runnable r = new NewThread(wordsTemp); new Thread(r).start(); wordsTemp.clear(); wordsTemp.add(scanner.next().toLowerCase()); } } scanner.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } for (Map.Entry<String, Integer> e : mainMap.entrySet()) { System.out.println(e.getKey() + ":" + e.getValue()); } System.out.println("Уникальных слов: "+mainMap.size()); System.out.println("Всего слов: "+i); long timeSpent = System.currentTimeMillis() - startTime; System.out.println("программа выполнялась " + timeSpent + " миллисекунд"); } // Создание второго потока static class NewThread implements Runnable { Thread thread; //Map<String, Integer> map; ArrayList <String> strings; // Конструктор NewThread(ArrayList <String> strings) { // Создаём новый второй поток this.strings = strings; thread = new Thread(this, "Поток для примера"); thread.start(); } public void run() { Map<String, Integer> mTempMap = wordCount(strings, mainMap); synchronized (this) { mainMap = mTempMap; } } } public static Map<String, Integer> wordCount( ArrayList <String> strings, Map<String, Integer> map) { Map<String, Integer> mapTemp = map; String s; for (int i = 0;strings.size()>i;i++){ s = strings.get(i); if (s!=null) { if (!map.containsKey(s)) { // first time we've seen this string map.put(s, 1); } else { int count = map.get(s); map.put(s, count + 1); } } } return map; } //Преобразование public static String convertedWord (String word){ String result = word.replaceAll("\\p{Punct}", "");//регулярным выражением заменяем буквы(\\w) и цифры(\\d) на пустую строку, то есть удаляем. return result; } 
  • What do you think happens to NewThread.strings after calling wordsTemp.clear(); ? - andreycha
  • truly noticed a colleague above. It looks like you are clearing an object by reference. - Senior Pomidor

1 answer 1

I will write a few questions, comments:

  1. What do you think happens to NewThread.strings after calling wordsTemp.clear(); ? And what happens if you replace it with wordsTemp = new ArrayList<String>(); ?
  2. The wordCount() method can produce incorrect results, because the code inside the if (s!=null) condition is not atomic and not synchronized. For example, if three threads come to this section of the code for the same word, then after the completion of this section, the map may not consist of count + 3 , but only count + 1 :

     int count = map.get(s); map.put(s, count + 1); 

    It is better to use the compute() method.

  3. synchronized around link assignment is not needed - this operation is both atomic. And the assignment itself is not necessary - see the following. paragraph.

  4. Do not confuse yourself in the wordCount() method. You accept the map, assign it to an unused variable, modify it, return it, and then re-assign it. Just modify and that's it. Clean the code (all), you look and it will be easier to search for errors.
  • Thank you, the first point helped right away (question-comment). - Egor Chetvernin
  • @EgorChetvernin please. - andreycha