Hello. There is a code:

int size = sWebList.size(); for(int i=0;i<(size);i++){ int j = sWebList.size(); if (j!=0){ String sURL = sWebList.get(0); System.out.println("----------------------------"+sURL+"-------------------------------------"); sWebList.remove(0); try{ URL url = new URL("http://www."+sURL+"/robots.txt"); try{ LineNumberReader reader = new LineNumberReader(new InputStreamReader(url.openStream())); String string = reader.readLine(); while(string!=null){ sRobots.add(string); string = reader.readLine(); } reader.close(); }catch(IOException e){ e.printStackTrace(); } for (String line : sRobots) { System.out.println(line); } sRobots.clear(); }catch (MalformedURLException ex){ ex.printStackTrace(); } } } 

A sWebList from sWebList takes the site name, reads and sWebList robots.txt . Some sites, if they do not detect the User-Agent , throw an Exception . How to add setRequestProperty("User-Agent", "Mozilla/5.0"); to this code setRequestProperty("User-Agent", "Mozilla/5.0"); ?

You can do this:

 URLConnection uc; StringBuilder parserContentFromUrl = new StringBuilder(); String urlString = "http://www."+sURL+"/robots.txt"; try{ url = new URL(urlString); uc = url.openConnection(); uc.addRequestProperty("User-Agent","Mozilla/5.0"); uc.connect(); uc.getInputStream(); BufferedInputStream in = new BufferedInputStream(uc.getInputStream()); int ch; while((ch = in.read()) != -1){ parserContentFromUrl.append((char) ch); } System.out.println(parserContentFromUrl); } catch (Exception ex){ ex.printStackTrace(); } 

But I need to further in sRobots were the line, whereas in the second version byte-by-turn turns out. Help me to understand.

  • Understood: URL url = new URL("http://www."+sURL+"/robots.txt"); URLConnection uc = url.openConnection(); uc.addRequestProperty("User-Agent", "Mozilla/5.0"); uc.connect(); try{ LineNumberReader reader = new LineNumberReader(new InputStreamReader(uc.getInputStream())); URL url = new URL("http://www."+sURL+"/robots.txt"); URLConnection uc = url.openConnection(); uc.addRequestProperty("User-Agent", "Mozilla/5.0"); uc.connect(); try{ LineNumberReader reader = new LineNumberReader(new InputStreamReader(uc.getInputStream())); - Cenzor
  • one
    If you understand - please issue as an answer. - Alex Chermenin

0