In android, when parsing jsoup, not all html text is loaded

Question

Good In android'e parsed html site. Here is a primitive code:

class Description extends AsyncTask<Void, Void, Void> { public String desc; @Override protected void onPreExecute() { super.onPreExecute(); } @Override protected Void doInBackground(Void... params) { try { Document document = Jsoup.connect("http://www.google.ru").get(); desc = document.html(); } catch (IOException e) { e.printStackTrace(); } return null; } @Override protected void onPostExecute(Void result) { Log.i("test", desc); } }

At the same time, only part of the html code is loaded, interrupted at a random place. When checking in Eclipse everything is well displayed.

@YuriSPB @metalurgus I do not think that the problem with the text output.

And when getting html'a by means of HttpUrlConnection'a the same nonsense: s The point is not that there are some blocks in the page. The code is interrupted at a random place, it may even be in half of the tag name. Hardly, this is due to the peculiarities of the String or logs. Did line-by-line logging with the help of InputStream'a - the result is the same. (!) Besides, look, please, at the last screen, there I sampled the tag 'a'. Nothing found. (!) In eclipse, everything works fine.

If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer).

Nick Volynkin ♦ 24.6k 14 gold signs 95 silver marks 175 bronze marks · Answer 1 · 2016-01-25T07:51:02

 Document document = Jsoup.connect("http://www.google.ru").maxBodySize(0).get();

Solves the problem maxBodySize(0) which removes the limit on the page size, you can also remove timeout(0) .

Answer 2 · 2015-11-18T05:28:27

The page located at the address you download ( http://www.google.ru ) does not have HTML code (except for html , head , sctipt ), and all content on it is generated using JavaScript dynamically. This means that you get all the page code, but JSOUP does not start JavaScript .

@BORSHEVIK The point is not that there are no blocks in the page.
The code is interrupted at a random place, it may even be in half of the tag name.
Hardly, this is due to the peculiarities of the String or logs.
Did line-by-line logging with the help of InputStream'a - the result is the same.
(!) Besides, look, please, at the last screen, there I sampled the tag 'a'.

Yuriy SPb ♦ YuriySPb 58.6k 7 golden marks 50 silver marks 99 bronze marks · Answer 3 · 2015-11-16T01:46:46

Most likely, the point here is not that not all HTML loaded, but that Log simply did not output all the text.

Try to break the resulting string into small pieces and put them in a loop, in turn.

Well, or just try using Jsoup to refer to the last tag in the loaded HTML .

BORSHEVIK BORSHEVIK 2.477 11 silver marks 30 bronze marks · Answer 4 · 2015-11-18T09:11:30

Metalurgus Right, he himself faced such a problem. The problem is that you are downloading an HTML page, but JS scripts are not executed and therefore you do not see the full picture

You can include sample code, explain the solution, provide links with a more detailed analysis of the problem, etc.

Legionary 1.407 5 silver marks 16 bronze marks · Answer 5 · 2016-09-22T06:42:20

Only maxBodySize(0) did not help me. I got all the code when I wrote:

 doc = Jsoup.connect(mLink).maxBodySize(0).userAgent("Mozilla/5.0 (Windows NT 5.1)" + "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36") .timeout(0).get();

In android, when parsing jsoup, not all html text is loaded

5 answers 5

More articles: