Good In android'e parsed html site. Here is a primitive code:

class Description extends AsyncTask<Void, Void, Void> { public String desc; @Override protected void onPreExecute() { super.onPreExecute(); } @Override protected Void doInBackground(Void... params) { try { Document document = Jsoup.connect("http://www.google.ru").get(); desc = document.html(); } catch (IOException e) { e.printStackTrace(); } return null; } @Override protected void onPostExecute(Void result) { Log.i("test", desc); } } 

At the same time, only part of the html code is loaded, interrupted at a random place. When checking in Eclipse everything is well displayed.

enter image description here

@YuriSPB @metalurgus I do not think that the problem with the text output. enter image description here

And when getting html'a by means of HttpUrlConnection'a the same nonsense: s The point is not that there are some blocks in the page. The code is interrupted at a random place, it may even be in half of the tag name. Hardly, this is due to the peculiarities of the String or logs. Did line-by-line logging with the help of InputStream'a - the result is the same. (!) Besides, look, please, at the last screen, there I sampled the tag 'a'. Nothing found. (!) In eclipse, everything works fine.

  • you are mistaken, it loads the document completely. For example, the log itself can trim it. Or even the Document toString() method may truncate it. - Vladyslav Matviienko
  • If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer). - Nicolas Chabanovsky

5 answers 5

 Document document = Jsoup.connect("http://www.google.ru").maxBodySize(0).get(); 

Solves the problem maxBodySize(0) which removes the limit on the page size, you can also remove timeout(0) .

    The page located at the address you download ( http://www.google.ru ) does not have HTML code (except for html , head , sctipt ), and all content on it is generated using JavaScript dynamically. This means that you get all the page code, but JSOUP does not start JavaScript .

    • @BORSHEVIK The point is not that there are no blocks in the page. The code is interrupted at a random place, it may even be in half of the tag name. Hardly, this is due to the peculiarities of the String or logs. Did line-by-line logging with the help of InputStream'a - the result is the same. (!) Besides, look, please, at the last screen, there I sampled the tag 'a'. Nothing found. (!) In eclipse, everything works fine. - differ

    Most likely, the point here is not that not all HTML loaded, but that Log simply did not output all the text.

    Try to break the resulting string into small pieces and put them in a loop, in turn.

    Well, or just try using Jsoup to refer to the last tag in the loaded HTML .

      Metalurgus Right, he himself faced such a problem. The problem is that you are downloading an HTML page, but JS scripts are not executed and therefore you do not see the full picture

      • Try to write more detailed answers. You can include sample code, explain the solution, provide links with a more detailed analysis of the problem, etc. - Athari

      Only maxBodySize(0) did not help me. I got all the code when I wrote:

       doc = Jsoup.connect(mLink).maxBodySize(0).userAgent("Mozilla/5.0 (Windows NT 5.1)" + "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36") .timeout(0).get();