Help. The program should display the title of the web page. I do not understand what the error is.

public static String httpTitle(URL url) throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream())); String str; String title = ""; String pattern = "(?i)<title([^>]+)>(.+?)</title>"; while ((str = in .readLine()) != null) { Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(str); m.matches(); title = m.group(3); } in .close(); return title; } public static void main(String[] args) throws MalformedURLException, IOException { URL url = new URL("http://www.google.com.ua"); System.out.print(httpTitle(url)); } 
  • one
    I hate regexp: (?i)<title([^>]+)>(.+?)</title> - well, what is it? Tin ... What does this have to do with programming - this is stupid shamanism! - Barmaley

2 answers 2

  • The quantifier for tag internals must be '*', because the tag can close immediately <title> .
  • The pattern is applied line by line, but the tag can be broken by a line break. Therefore, you must first read the entire stream, and then apply a pattern to it, while not forgetting to specify the MULTILINE flag.
  • The first brackets are not sub-masks, so the title should be searched for in the second group / title=m.group(2);

So everything should work, but you can still change something.

  • Why two quantifiers in a row '+?' in the meaning of the tag? This is equivalent to '*'.
  • Taking into account clause 2, it loses its relevance, but I cannot fail to point out that compiling the same regular expression in a loop is a waste of processor time. One time is enough.

    I would write like this. It is convenient to test through http://regexpal.com

     (?i)\<title([^>]+)?\>(.+?)\<\/title\>