It is necessary to parse the html-file without third-party libraries. Tell me how to do this?

  • php does not roll) - Pavel
  • What do you need? - minority
  • file_get_contents doesn't roll too) - Sh4dow
  • Does it in any way parse html? IMHO it simply counts the file part into a string. - avp
  • 3
    @minority is never called parsing. To count is to count, “sparss html” usually means turning the text into a tree of objects / arrays. Parsing in general - converting text into a data structure. - Sh4dow

2 answers 2

Example of use (simple, taken from a similar topic)

 final String WORD = "[a-zA-Zа-яА-Я]+"; Pattern pattern = Pattern.compile(WORD); Matcher matcher = pattern.matcher(externalText);//указываем свой текст while (matcher.find()) { System.out.println(matcher.group(0)); } 

This piece of code displays the found words in the text in which the characters a-zA-Za-YaA-Z are found. How to create a regular template can be read in JavaDoc.
Key classes in JavaDoc (Pattern and Matcher).

  • Well, be prepared that complex regular expressions will be used to search for complex expressions. - Viacheslav
  • ok thanks, I feel it will not be so easy .. - Pavel
  • 2
    Not so difficult. Example with PCRE (Perlovskaya regular calendar) (looking for everything that is not a tag) $ reg = '/> ([^ <] *) </'; =) - Sh4dow
  • probably) I'll see .. - Pavel

Does anyone forbid the use of third-party libraries? 0_o

  • Well, I need to figure it out ... - Pavel