How to get pictures by means of the Jsoup library in Java via the https protocol?

Question

The whole question in the title, I used the method of the link , but it works fine with the http protocol, but for https does not want to find anything, I tried to download pictures from Twitter by means of this library, went to the page from the browser and looked through the codes of the elements by manual search managed to find links to pictures, but the library does not cope, tell me what is wrong

My code is:

Document doc = Jsoup.connect("https://www.google.by/search?q=images&biw=1680&bih=913&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIhMivvNX2xwIVBlksCh0JUAbr").get(); StringBuilder stringBuilder = new StringBuilder(""); for(Element e : doc.select("img")){ stringBuilder.append(e.attr("src")+"\n"); l++; }

If there is no possibility through this library, then I ask you to suggest another way, or should I use my own parser?

This code works, just had to take into account the redirection from the page

Community spirit ♦ one · Answer 1 · 2015-09-15T08:29:31

I found what the problem was, it was because the link was redirected, but because I did not take this into account, then the pictures were not visible, this code helped me

Android Android Android 6,774 2 13 34 · Answer 2 · 2015-09-14T15:06:00

In fact, it is not always so easy to parse, as in the example. I have three applications in the market - these are parsed sites, well, well, it was doc.select that was rolled very rarely. It turned out everywhere something of this kind:

 html.getElementsByClass("comment").get(i).getElementsByClass("datacom").get(0).text();

In general, there are very complex queries to the desired item. Try under the debug first to get to one element, if it works, then it will be a cycle for all.

Thank you, I will definitely try tomorrow and accomplish my goal, but the method I specified works fine with http protocols, at least in VK I displayed the links correctly

Android Android Android 6,774 2 13 34 · Answer 3 · 2015-09-15T10:07:42

In fact, it is not always so easy to parse, as in the example. I have three applications in the market - these are parsed sites, well, well, it was doc.select that was rolled very rarely. It turned out everywhere something of this kind:

 html.getElementsByClass("comment").get(i).getElementsByClass("datacom").get(0).text();

In general, there are very complex queries to the desired item. Try under the debug first to get to one element, if it works, then it will be a cycle for all.

Update

For the sake of interest, I also decided to see how to be with https. In the end, I came out like this.

 class HTTPRequest extends AsyncTask<Void, Void, Void> { @Override protected Void doInBackground(Void... params) { try { HttpClient httpclient = new DefaultHttpClient(); HttpGet httpget = new HttpGet("https://www.google.by/search?q=images&biw=1680&bih=913&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIhMivvNX2xwIVBlksCh0JUAbr"); HttpResponse response; try { response = httpclient.execute(httpget); HttpEntity entity = response.getEntity(); if (entity != null) { InputStream instream = entity.getContent(); String result = convertStreamToString(instream); instream.close(); final Document html = Jsoup.parse(result); ArrayList<String> images = PageLoader.getImages(html); } } catch (Exception ignored) { ignored.printStackTrace(); } } catch (Exception ignored) { return null; } return null; } @Override protected void onPostExecute(Void aVoid) { super.onPostExecute(aVoid); } public static String convertStreamToString(InputStream is) { BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuilder sb = new StringBuilder(); String line = null; try { while ((line = reader.readLine()) != null) { sb.append(line).append("\n"); } } catch (IOException e) { e.printStackTrace(); } finally { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString(); } "); class HTTPRequest extends AsyncTask<Void, Void, Void> { @Override protected Void doInBackground(Void... params) { try { HttpClient httpclient = new DefaultHttpClient(); HttpGet httpget = new HttpGet("https://www.google.by/search?q=images&biw=1680&bih=913&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIhMivvNX2xwIVBlksCh0JUAbr"); HttpResponse response; try { response = httpclient.execute(httpget); HttpEntity entity = response.getEntity(); if (entity != null) { InputStream instream = entity.getContent(); String result = convertStreamToString(instream); instream.close(); final Document html = Jsoup.parse(result); ArrayList<String> images = PageLoader.getImages(html); } } catch (Exception ignored) { ignored.printStackTrace(); } } catch (Exception ignored) { return null; } return null; } @Override protected void onPostExecute(Void aVoid) { super.onPostExecute(aVoid); } public static String convertStreamToString(InputStream is) { BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuilder sb = new StringBuilder(); String line = null; try { while ((line = reader.readLine()) != null) { sb.append(line).append("\n"); } } catch (IOException e) { e.printStackTrace(); } finally { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString(); }

Get your pictures way managed. =) html.select("img").attr("src")

I don’t praise myself, but at the moment I haven’t found a better way, and this method is pretty good and in the cases I checked out gives me what I need, it even perfectly shows the presence of pictures encoded in Base64, after which the picture is pretty easy to get by decoding it, overall I am pleased with the result, I hope it will help others
By the way, you use a rather cumbersome code for these purposes, by Jsoup means there is no need to create threads yourself, etc.
I agreed that it was cumbersome, but in a short stroke I did not manage to decode the array of bytes, which I received at the output.
This is the trick of "Soup", you can not mess with the streams and do it in a couple of lines

How to get pictures by means of the Jsoup library in Java via the https protocol?

3 answers 3

More articles: