What did I miss?
Before the cycle you get:
String linkHref = link.attr("abs:href");
and nowhere else in the loop, do you change this variable; therefore, it is duplicated everywhere.
This task can be solved somewhat easier:
We look at the source code of these elements:
<li class="cat-item cat-item-20"><a href="http://www.novostiit.net/category/it" title="IT технологии, новости информационных технологий, новости it технологий, новости технологии, новости it технологии ">IT технологии</a> (407) </li> <li class="cat-item cat-item-18"><a href="http://www.novostiit.net/category/video" title="видео новости">Видео</a> (484) </li> ...
For all elements, the class attribute starts with cat-item cat-item (and there are no other elements on this page with similar attributes), therefore, you can get them like this:
Document page = Jsoup.connect("http://www.novostiit.net/category/company").get(); Elements categories = page.select("li[class^='cat-item cat-item']");
And get a name and link like this:
for (Element category : categories) { System.out.println(category.select("a").text()); System.out.println(category.select("a").attr("href") + "\n"); }
Output to console:
IT технологии http://www.novostiit.net/category/it Видео http://www.novostiit.net/category/video Все новости http://www.novostiit.net/category/novosti Гаджеты http://www.novostiit.net/category/gadgety Игры http://www.novostiit.net/category/igry ...
Ps. Jsoup has a very handy tool .