Hello. I am writing an application for Android, which is a parsit from a corporate page, a work schedule. On the original page there is one table that contains graphs for all 12 months for each employee of the department. There are about 600-650 rows in the table. When executing this code:
Document doc = Jsoup.connect("http://url.htm").get();
in doc, as expected, the document is saved. However, the following expression:
doc.select("tr").size(); returns the number 451. The first 451 lines parry without any problems (almost), and where are the rest?
Here is a piece when the original page:
<tr height=17 style='height:12.75pt'> <td height=17 class=xl9817500 style='height:12.75pt;border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6817500 style='border-top:none'> </td> <td class=xl12217500 style='border-top:none'>0</td> <td class=xl12217500 style='border-top:none'>0</td> <td class=xl13817500 style='border-top:none'>0</td> <td class=xl13117500 style='border-top:none'>0</td> <td class=xl13117500 style='border-top:none'>0</td> <td class=xl10517500 style='border-top:none'>0</td> <td class=xl6553517500></td> <td class=xl6553517500></td> <td class=xl6553517500></td> </tr> <tr height=17 style='height:12.75pt'> <td height=17 class=xl9817500 style='height:12.75pt;border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6717500 style='border-top:none'> </td> <td class=xl6917500 style='border-top:none'> </td> <td class=xl6817500 style='border-top:none'> </td> <td class=xl12217500 style='border-top:none'>0</td> <td class=xl12217500 style='border-top:none'>0</td> <td class=xl13817500 style='border-top:none'>0</td> <td class=xl13117500 style='border-top:none'>0</td> <td class=xl13117500 style='border-top:none'>0</td> <td class=xl10517500 style='border-top:none'>0</td> <td class=xl6553517500></td> <td class=xl6553517500></td> <td class=xl6553517500></td> </tr> Of such tr and td consists the entire page. I brought a piece on which the document breaks off. The doc-downloaded jsoup document ends on the second (in the code above) tr, on the 23rd td account. As I understand it, the table is automatically generated:
<!--[if !excel]> <![endif]--> <!--Следующие сведения были подготовлены мастером публикации веб-страниц Microsoft Excel.--> <!--При повторной публикации этого документа из Excel все сведения между тегами DIV будут заменены.--> <!-----------------------------> <!--НАЧАЛО ФРАГМЕНТА ПУБЛИКАЦИИ МАСТЕРА ВЕБ-СТРАНИЦ EXCEL --> <!-----------------------------> Please tell me what could be the problem?
doc. - post_zeew