There is a text file to enter into it the lines unloaded from the site, look like this:

2221214981 2221214981 In the liquidation phase 2017-06-19T00: 00: 00

7814249543 7814249543 A decision was made on the upcoming exclusion of an inactive legal entity from USREX No. 30186 of 01/09/2017. Published in the Journal of the State Registration No. 09/06/2017 No. 35 2017-09-01T00: 00: 00

2539045051 2539045051 Is in the process of reorganization in the form of a merger with another LE 2016-11-30T00: 00: 00

there are four columns, columns are tabulated, the first two are identical for my own verification, this is the list of inn for which I wanted to unload and the inn that was unloaded after accessing the site. Then comes some text. Next comes the date.

I need to pull out the unique text lines for further analysis (from the third column).

I know how to do this through database queries. But I want to do it on java.

How can this be done or where to look to understand it?

  • one
    Towards regespov. Parsing is possible via JSOUP - V.March 2008
  • five
    Parsing doesn't need regexpami - tutankhamun

3 answers 3

Using Java 8 and Stream API can be done as follows:

Files.lines(Paths.get("/path/to/file.txt")) // читаем файл по строкам .map(s -> s.split("\t")[2]) // делим по табулятору и берём третий "столбец" .distinct() // оставляем только уникальные значения .forEach(System.out::println); // выводим результат 

To write the result to a file, you can write this:

 // Читаем уникальные строки Stream<String> lines = Files.lines(Paths.get("/path/to/file.txt")) .map(s -> s.split("\t")[2]).distinct(); // Записываем в новый файл Files.write(Paths.get("/path/to/result.txt"), (Iterable<String>) lines::iterator); 
  • it's not entirely clear what goes to the points. Maybe I do not have some kind of library? - Uhntiss
  • @Uhntiss is a single line of code, carried point by point. If you have Java 8 or higher, everything should be. - Alex Chermenin
  • one
    @Uhntiss, this is a fluent interface . - D-side
  • Thank! everything turned out! How to correctly write the result to the file? If I write this way .forEach (Files.write (Paths.get ("/ my / new / file / path.txt"), lines, Charset.forName ("UTF-8")) - Uhntiss
  • @Uhntiss added a response with a description of how to write to the file - Alex Chermenin

To search for unique strings, use the HashSet class. When an item is added, its hash is calculated, and if such an item in the collection is present, it is not added.

For example:

 Set<String> unique = new HashSet<String>(); unique.add("Monday"); unique.add("Friday"); unique.add("Monday"); unique.add("Sunday"); unique.add("Sunday"); for (String s : unique) { System.out.println(s); } 

Will result in:

 Monday Friday Sunday 
     Set<String> uniq = new HashSet<String>; for(int i = 0; i < inputStringArray.length(); i++){ uniq.add(inputStringArray[i].split("\\t")[3]); }