It is necessary to parse the page and take the train schedule from there. Jsoup does not fit, I do not know why, but it refuses to parse the page and gives an error

Caused by: java.lang.IndexOutOfBoundsException: Invalid index 0, size is 0

Code:

Document doc2; doc2 = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today").get(); Element table2=doc2.select("table.schedule_table_classic").get(0); 

Explain to me somebody, whether it is XML and how it is possible to extract the schedule.

    4 answers 4

    Here, try this, it works for me, it displays both rows and columns.

      doc2 = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today") .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36") .method(org.jsoup.Connection.Method.GET) .get(); Element table=doc2.getElementById("schedule_table"); Elements tr=table.getElementsByTag("tr"); Elements td=tr.select("td"); for(Element el:td) System.out.println(el); 
    • Caused by: java.lang.NullPointerException in the string Elements tr=table_tr.getElementsByTag("tr"); I do not know why it works for you, but I do not. Maybe there is another way to parse the schedule? For example, if the site has an iframe code for export, can you somehow use it to pull out the schedule? - GinTR1k
    • one
      @ Mr.GinTR1k, try this: doc2 = Jsoup.connect (" tutu.ru/rasp.php?st1=16503&st2=17003&date=today" ) .userAgent ("Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.36 ( KHTML, like Gecko) Chrome / 49.0.2623.110 Safari / 537.36 ") .method (org.jsoup.Connection.Method.GET) .get (); - andy
    • @ Mr.GinTR1k, I always use Jsoup for parsing, because it is convenient and quite simple. On Java there are many ready-made HtmlParserov, google. You can try HtmlUnit, also a good library - andy

    Your error indicates that doc2.select("table.schedule_table_classic") returns a list of elements of zero length, after which you try to take its first element that does not exist.

    Decision:

    Once again, examine the target page (its code) and create a valid expression to search for the desired list of elements.

     Elements tableRows = doc2.getElementById("schedule_table").getElemetsByTagName("tr"); 
    • Elements tableRows = doc2.getElementById("schedule_table").getElemetsByTagName("tr"); Error - Cannot find symbol method getElemetsByTagName(String) Tried to play with this line and it turned out the following: Elements tableRows = doc2.getElementById("schedule_table").getElementsByTag("tr"); There is no error anymore, at least when compiling. Then, as soon as it comes to this line, another error occurs - Caused by: java.lang.NullPointerException - GinTR1k
    • @ Mr.GinTR1k, well, maybe the page is formed after it is displayed in the browser? Those. it's impossible to parse it with JSOUP ... Put the html that you get into the logs / file - maybe there is no required information. - Yuriy SPb
    • If not, then how to be? How can I parse the table with the schedule? - GinTR1k
    • @ Mr.GinTR1k, if there is such a problem, then you can somehow slyly library some kind of library to access the page through the browser. But, most likely, you just need to thoroughly examine the code of the page and try to pull out at least something, displaying intermediate results in the logs. - Yuriy SPb

    Try with timeout. With this code, I have everything.

      Document doc = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today") .userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36") .timeout(150000).get(); Element table = doc.getElementById("schedule_table"); Elements tr = table.getElementsByTag("tr"); Elements td = tr.select("td"); for (Element el : td) System.out.println(el); 
       URL url=new URL("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today"); HttpURLConnection request= (HttpURLConnection) url.openConnection(); request.setRequestMethod("GET"); request.connect(); InputStream is=request.getInputStream(); BufferedReader br=new BufferedReader(new InputStreamReader(is)); StringBuilder sb=new StringBuilder(); String rline=null; while((rline=br.readLine())!=null){ sb.append(rline); } 

      and then the string can be parsed with Jsoup

       org.jsoup.nodes.Document docu = Jsoup.parse(sb.toString()); 

      and so on

      • There is no table anyway. More precisely, the table itself is - its class, but the schedule itself (time, destination, etc.) is not. - GinTR1k
      • @ Mr.GinTR1k, what does the xml file structure look like? - andy
      • As I understand it, this is not xml. Most likely the schedule appears as a result of javascript execution, if you look at the names and addresses of these scripts. Please forgive my stupidity, because this is my first job in java, and because 1 course is still) - GinTR1k
      • @ Mr.GinTR1k, your above written code works, you can parse the page - andy
      • Parsing, but in the table.schedule_table_classic element table.schedule_table_classic not a single element with the tr tag. Accordingly, the schedule is not immediately loaded. - GinTR1k