Good day. The brain has already broken, I can not figure out how to properly parse this page class and correctly drive it all into the TableRow:

<div id="contrighta" class="contrighta"> <h1>Some label</h1><br> <table> <tbody> <tr> <th width="420" align="left" valign="top">News Title</th> <th width="70 " align="left" valign="top">Date</th> <th width="120" align="left" valign="top">News Category</th> <th width="100" align="left" valign="top">Language</th> </tr> <tr> <td align="left" valign="top" class="bodyblack"> <a href="some_link">Some text</a></td> <td align="left" valign="top" nowrap="">Some data</td> <td align="left" valign="top" nowrap="">Some people</td> <td align="left" valign="top" nowrap="">Some language</td> </tr> <tr> <td align="left" valign="top" class="bodyblack"> <a href="some_link">Some text</a></td> <td align="left" valign="top" nowrap="">Some data</td> <td align="left" valign="top" nowrap="">Some people</td> <td align="left" valign="top" nowrap="">Some language</td> </tr> <tr> <td align="left" valign="top" class="bodyblack"> <a href="some_link">Some text</a></td> <td align="left" valign="top" nowrap="">Some data</td> <td align="left" valign="top" nowrap="">Some people</td> <td align="left" valign="top" nowrap="">Some language</td> </tr> <tr> <td align="left" valign="top" class="bodyblack"> <a href="some_link">Some text</a></td> <td align="left" valign="top" nowrap="">Some data</td> <td align="left" valign="top" nowrap="">Some people</td> <td align="left" valign="top" nowrap="">Some language</td> </tr> </tbody> </table> 

In fact, I got to the text, but it would still be correct to group this heap, and still somehow output it to the TableRow

 if (doc != null) { Elements tableRows = doc.getElementsByClass("contrighta") .select("tr"); Iterator<Element> rowIterator = tableRows.iterator(); while (rowIterator.hasNext()) { Element tableRow = rowIterator.next(); //<td align="left" valign="top" class="bodyblack"><a href="some_link">Some text</a></td> Elements data = tableRow.select("td"); //Log.d("NewsFragment", "" + data); for (Element link : data) { Log.d("Return: ", "" + link.text()); } } } 

    1 answer 1

    • The layout lacked one closing div
    • added just in case the fire case in the layout html / head / body
    • for convenience, put the data cells in the map
    • when crawling lines, I missed a line with headers, because it did not contain data

       String html = "<html>" + "<head></head>" + "<body>" + "<div id=\"contrighta\" class=\"contrighta\">\n" + "<h1>Some label</h1><br>\n" + "<table>\n" + " <tbody>\n" + " <tr>\n" + " <th width=\"420\" align=\"left\" valign=\"top\">News Title</th>\n" + " <th width=\"70 \" align=\"left\" valign=\"top\">Date</th>\n" + " <th width=\"120\" align=\"left\" valign=\"top\">News Category</th>\n" + " <th width=\"100\" align=\"left\" valign=\"top\">Language</th>\n" + " </tr>\n" + " <tr>\n" + " <td align=\"left\" valign=\"top\" class=\"bodyblack\">\n" + " <a href=\"some_link\">Some text</a></td>\n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some data</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some people</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some language</td>\n" + " </tr>\n" + " <tr>\n" + " <td align=\"left\" valign=\"top\" class=\"bodyblack\">\n" + " <a href=\"some_link\">Some text</a></td>\n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some data</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some people</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some language</td>\n" + " </tr>\n" + " <tr>\n" + " <td align=\"left\" valign=\"top\" class=\"bodyblack\">\n" + " <a href=\"some_link\">Some text</a></td>\n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some data</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some people</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some language</td>\n" + " </tr>\n" + " <tr>\n" + " <td align=\"left\" valign=\"top\" class=\"bodyblack\">\n" + " <a href=\"some_link\">Some text</a></td>\n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some data</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some people</td> \n" + " <td align=\"left\" valign=\"top\" nowrap=\"\">Some language</td>\n" + " </tr>\n" + " </tbody>\n" + "</table>" + "</div>" + "</body>" + "</html>"; Document doc = Jsoup.parse(html); List<HashMap<String,String>> table = new ArrayList<HashMap<String,String>>(); for(Element row: doc.select("div#contrighta table tr")){ Elements cells = row.select("td"); if(cells.size()==0) continue; HashMap<String,String> map = new HashMap<String, String>(); map.put("text", cells.get(0).text()); map.put("data", cells.get(1).text()); map.put("people", cells.get(2).text()); map.put("language", cells.get(3).text()); table.add(map); } for(Map<String,String> map:table){ Log.i("JSOUP/Data","text = " + map.get("text")); Log.i("JSOUP/Data","data = " + map.get("data")); Log.i("JSOUP/Data","people = " + map.get("people")); Log.i("JSOUP/Data","language = " + map.get("language")+"\n"); } 

    In order to include headings in the list with the data you need:

    • replace the extraction of all TD nodes from the row ( row.select ("td") ) with the extraction of all child ( row.children () )
    • elements remove the condition in which iteration is skipped in a row without TD if (cells.size () == 0) continue;

    It was:

      Elements cells = row.select("td"); if(cells.size()==0) continue; 

    It became:

      Elements cells = row.children();