JSoup: Problem with IndexOutOfBoundsException

Question

It is necessary to parse the page and take the train schedule from there. Jsoup does not fit, I do not know why, but it refuses to parse the page and gives an error

Caused by: java.lang.IndexOutOfBoundsException: Invalid index 0, size is 0

Code:

Document doc2; doc2 = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today").get(); Element table2=doc2.select("table.schedule_table_classic").get(0);

Explain to me somebody, whether it is XML and how it is possible to extract the schedule.

Accepted Answer · 2016-04-05T08:41:59

Here, try this, it works for me, it displays both rows and columns.

  doc2 = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today") .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36") .method(org.jsoup.Connection.Method.GET) .get(); Element table=doc2.getElementById("schedule_table"); Elements tr=table.getElementsByTag("tr"); Elements td=tr.select("td"); for(Element el:td) System.out.println(el);

Caused by: java.lang.NullPointerException in the string Elements tr=table_tr.getElementsByTag("tr");
For example, if the site has an iframe code for export, can you somehow use it to pull out the schedule?
@ Mr.GinTR1k, try this: doc2 = Jsoup.connect (" tutu.ru/rasp.php?st1=16503&st2=17003&date=today" ) .userAgent ("Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.36 ( KHTML, like Gecko) Chrome / 49.0.2623.110 Safari / 537.36 ") .method (org.jsoup.Connection.Method.GET) .get ();
@ Mr.GinTR1k, I always use Jsoup for parsing, because it is convenient and quite simple.

Yuriy SPb ♦ YuriySPb 58.6k 7 48 99 · Answer 2 · 2016-04-03T12:31:01

Your error indicates that doc2.select("table.schedule_table_classic") returns a list of elements of zero length, after which you try to take its first element that does not exist.

Decision:

Once again, examine the target page (its code) and create a valid expression to search for the desired list of elements.

 Elements tableRows = doc2.getElementById("schedule_table").getElemetsByTagName("tr");

Yuriy SPb ♦

58.6k 7 48 99

Elements tableRows = doc2.getElementById("schedule_table").getElemetsByTagName("tr"); Error - Cannot find symbol method getElemetsByTagName(String) Tried to play with this line and it turned out the following: Elements tableRows = doc2.getElementById("schedule_table").getElementsByTag("tr"); There is no error anymore, at least when compiling. Then, as soon as it comes to this line, another error occurs - Caused by: java.lang.NullPointerException - GinTR1k
@ Mr.GinTR1k, well, maybe the page is formed after it is displayed in the browser? Those. it's impossible to parse it with JSOUP ... Put the html that you get into the logs / file - maybe there is no required information. - Yuriy SPb ♦
If not, then how to be? How can I parse the table with the schedule? - GinTR1k
@ Mr.GinTR1k, if there is such a problem, then you can somehow slyly library some kind of library to access the page through the browser. But, most likely, you just need to thoroughly examine the code of the page and try to pull out at least something, displaying intermediate results in the logs. - Yuriy SPb ♦

|

Tryserg tryserg 80 7 · Answer 3 · 2016-04-05T11:06:56

Try with timeout. With this code, I have everything.

  Document doc = Jsoup.connect("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today") .userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36") .timeout(150000).get(); Element table = doc.getElementById("schedule_table"); Elements tr = table.getElementsByTag("tr"); Elements td = tr.select("td"); for (Element el : td) System.out.println(el);

andy andy 134 one eleven · Answer 4 · 2016-04-04T11:47:23

 URL url=new URL("http://www.tutu.ru/rasp.php?st1=16503&st2=17003&date=today"); HttpURLConnection request= (HttpURLConnection) url.openConnection(); request.setRequestMethod("GET"); request.connect(); InputStream is=request.getInputStream(); BufferedReader br=new BufferedReader(new InputStreamReader(is)); StringBuilder sb=new StringBuilder(); String rline=null; while((rline=br.readLine())!=null){ sb.append(rline); }

and then the string can be parsed with Jsoup

 org.jsoup.nodes.Document docu = Jsoup.parse(sb.toString());

and so on

More precisely, the table itself is - its class, but the schedule itself (time, destination, etc.) is not.
Most likely the schedule appears as a result of javascript execution, if you look at the names and addresses of these scripts.
this is my first job in java, and because 1 course is still)
@ Mr.GinTR1k, your above written code works, you can parse the page
Parsing, but in the table.schedule_table_classic element table.schedule_table_classic not a single element with the tr tag.

JSoup: Problem with IndexOutOfBoundsException

4 answers 4

More articles: