There was a need to optimize work with String and BufferedReader
As I understand it, the problem is the large number of new instances of the string being created. I ask you, if possible, to explain how to act correctly and why it is to write the code.

 public class ExtensionFilter implements FilenameFilter { private String extension; ExtensionFilter(String extension) { this.extension = extension; } public boolean accept(File dir, String name) { String f = new File(name).getName(); return f.indexOf(extension) != -1; } } public class NonUniqueWords { public static void main(String[] args) { TreeSet<String> treeSet = new TreeSet<>(); String dirName = "D://PROJECT"; String extension = ".java"; String s; FilenameFilter filter = new ExtensionFilter(extension); File dir = new File(dirName); String[] filenames = dir.list(filter); for (String filename : filenames) { try (BufferedReader br = new BufferedReader(new FileReader("D://PROJECT//" + filename))) { while ((s = br.readLine()) != null) { StringTokenizer tokenizer = new StringTokenizer(s); while (tokenizer.hasMoreTokens()) { String token = tokenizer.nextToken(); treeSet.add(token); } } } catch (IOException ex) { System.out.println(ex.getMessage()); ex.printStackTrace(); } } System.out.println("The amount of non-unique words: " + treeSet.size()); } } 
  • 2
    And where do you create many instances of the string? - pavel163
  • And what's the problem then? - rjhdby
  • @ pavel163, @rjhdby I thought that creating a large number of String token = tokenizer.nextToken(); I have about 463 items - Oleg

1 answer 1

I did not see any problems with strings here. As an optimization algorithm, I can suggest using HashSet instead of TreeSet .

TreeSet stores data in a sorted form, but it also has the logarithmic complexity of the algorithm for adding elements. The order of elements is not defined in HashSet , and the time of adding elements is constant.

Given that the order of elements is not important for your algorithm, this replacement will speed up the execution of the code, but will not affect the result.