Actually there are articles that are presented on the resource such html:
<article class="day_news_item"> <div class="day_news_item_img"> <a href="/world/20151117/1322854695.html"> <img src="http://cdn12.img22.ria.ru/images/132265/34/1322653488.jpg" alt="Президент России Владимир Путин. Архивное фото" title="Президент России Владимир Путин. Архивное фото" width="230" height="130" class="media"></a> </div> <div class="day_news_item_text"> <div class="day_news_item_title"> <h3> <a href="/world/20151117/1322854695.html">Путин: совместная работа Китая и России стабилизирует обстановку в мире</a> </h3></div><div class="day_news_item_announce"> <a href="/world/20151117/1322854695.html">Сотрудничество России и Китайской Народной Республики двигается вперед в области военно-технического сотрудничества, что является серьезным фактором, стабилизирующим обстановку в мире, заявил президент РФ. </a> </div> </div> </article> On one page there can be a lot of them. Here is the class constructor whose object I need to create for each news item:
public News(String imgRef, String title, String text, String date, String announce) { this.imgRef = imgRef; this.title = title; this.text = text; this.date = date; this.announce = announce; } Question: what is the best way to do it through JSoup? I do not really understand how to take information from nested tags and their attributes?
I hope clearly formulated the question. Thank!
I saved the site wget, went into its html and found there the very content that I needed (and all its nested tags and attributes).
It was also preceded by the following script:
<script>$(document).ready(function() { checkBannerHeight(17); });</script></div></div><div xmlns:str="http://exslt.org/strings" class="day_news"><div class="day_news_wrapper"> Maybe he should tell me how to get data from the main page, without going through each link to each news item separately?