I understand with sockets in Java and for one with the HTTP protocol. I send a request to the site in order to get its contents. The server returns the response body in gzip.

In general, the problem is that the data is read byte for a very long time, from 2 to 5 minutes.

c = new Socket("example.com", 80); PrintStream out = new PrintStream( c.getOutputStream() ); out.println("GET / HTTP/1.1"); out.println("Host: example.com"); out.println("User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0"); out.println("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); out.println("Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3"); out.println("Accept-Encoding: gzip, deflate"); out.println("Connection: keep-alive"); out.println(""); out.flush(); InputStream in = c.getInputStream(); byte[] buffer = new byte[8192]; // ByteArrayOutputStream ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ, ΠΊΠ°ΠΊ Π½Π°ΠΊΠΎΠΏΠΈΡ‚Π΅Π»ΡŒ Π±Π°ΠΉΡ‚, Ρ‡Ρ‚ΠΎΠ±Ρ‹ // ΠΏΠΎΡ‚ΠΎΠΌ ΠΏΡ€Π΅Π²Ρ€Π°Ρ‚ΠΈΡ‚ΡŒ Π² строку всС ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Π΅ Π΄Π°Π½Π½Ρ‹Π΅. // ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Ρ‹Π²Π°Ρ‚ΡŒ Ρ‡Π°ΡΡ‚ΡŒ ΠΏΠΎΡ‚ΠΎΠΊΠ° Π² строку опасно, Ρ‚.ΠΊ. // Ссли Π΄Π°Π½Π½Ρ‹Π΅ ΠΈΠ΄ΡƒΡ‚ Π² ΠΌΠ½ΠΎΠ³ΠΎΠ±Π°ΠΉΡ‚Π½ΠΎΠΉ ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΊΠ΅, ΠΎΠ΄ΠΈΠ½ символ ΠΌΠΎΠΆΠ΅Ρ‚ // Π±Ρ‹Ρ‚ΡŒ Ρ€Π°Π·Ρ€Π΅Π·Π°Π½ ΠΌΠ΅ΠΆΠ΄Ρƒ чтСниями ByteArrayOutputStream baos = new ByteArrayOutputStream(); // InputStream.read( byte[] ) Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ количСство ΠΏΡ€ΠΎΡ‡ΠΈΡ‚Π°Π½Π½Ρ‹Ρ… Π±Π°ΠΉΡ‚ // ΠΈ -1, Ссли ΠΏΠΎΡ‚ΠΎΠΊ кончился (сСрвСр Π·Π°ΠΊΡ€Ρ‹Π» соСдинСниС) for ( int received; (received = in.read( buffer )) != -1; ) { // записываСм ΠΏΡ€ΠΎΡ‡ΠΈΡ‚Π°Π½Π½ΠΎΠ΅ ΠΈΠ· ΠΏΠΎΡ‚ΠΎΠΊΠ°, ΠΎΡ‚ 0 Π΄ΠΎ количСства считанных baos.write( buffer, 0, received ); } // ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΡƒΠ΅ΠΌ Π² строку ( ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΊΡƒ ΠΆΠ΅Π»Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΡƒΠΊΠ°Π·Ρ‹Π²Π°Ρ‚ΡŒ ) String reply = baos.toString( "UTF-8" ); // ΠΌΠΎΠΆΠ½ΠΎ Ρ‚Π°ΠΊ, Π½ΠΎ toByteArray() создаСт копию массива, Π° я Ρƒ ΠΌΠ°ΠΌΡ‹ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ‚ΠΎΡ€ //String reply = new String( baos.toByteArray(), StandardCharsets.UTF_8 ); System.out.println( reply ); c.close(); 

If I read through BufferedReader, then only headers from the web server come back. But much faster than the previous method, almost instantly.

 BufferedReader br = new BufferedReader(new InputStreamReader(in)); while(true) { String line = br.readLine(); if(line.isEmpty()) break; } 

Tell me how to correctly read the response body from the server?

  • one
    The first method is designed for the fact that by giving the page the server closes the connection (i.e. http 1.0 or Connection: close header), although it is strange that it takes you minutes, my server turns off seconds after 5. The second method stops the cycle, encountering empty line, which, according to the standard, the title is separated from the content. Than you are not satisfied with the standard HttpURLConnection and other http-clients on java? - zRrr
  • Trying to make a proxy server. - or_die
  • @or_die try IOUtils.toByteArray(inputStream); to read the whole stream. measure the result - Senior Pomidor
  • @SeniorAutomator, the approximate same time reading stream. InputStream in = c.getInputStream(); byte[] bytes = IOUtils.toByteArray(in); System.out.println(new String(bytes, "UTF-8")); c.close(); - or_die
  • @or_die and what time do you get and wait? - Senior Pomidor

2 answers 2

After receiving an empty string through the BufferedReader, you should NOT do a break, but also read the number of bytes indicated in the site response header: Content-Length: xxx - this will be the body of the response.

 import java.net.*; import java.io.*; public class HelloWorld{ public static void main(String []args) { try { Socket c = new Socket("example.com", 80); PrintStream out = new PrintStream( c.getOutputStream() ); out.println("GET / HTTP/1.1"); out.println("Host: example.com"); out.println("User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0"); out.println("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); out.println("Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3"); out.println("Accept-Encoding: identity"); out.println("Connection: close"); out.println(""); out.flush(); InputStream sin = c.getInputStream(); //InputStream sin = Socket.getInputStream(); DataInputStream in = new DataInputStream(sin); //BufferedReader br = new BufferedReader(new InputStreamReader(in)); String line = ""; String str = ""; Integer len = 0; while(true) { line = in.readLine(); if (line.indexOf("Content-Length") != -1) { len = Integer.parseInt( line.split("\\D+")[1] ); System.out.println("LINEE="+len); } str = str + line + '\n'; if(line.isEmpty()) break; } int i = Integer.valueOf(len); String body= ""; System.out.println("i="+i); if (i>0) { byte[] buf = new byte[i]; in.readFully(buf); for (byte b:buf) body = body + (char)b; } System.out.println(str); System.out.println(body); c.close(); } catch (Exception x) { x.printStackTrace(); } } } 

View code online: http://www.tutorialspoint.com/compile_java8_online.php?PID=0Bw_CjBb95KQMR1p4ODN2MjI5d3c

  • Thank you so much ! - or_die

I do not quite see the point in writing code for reading only html. Moreover, to read everything in such a complicated way and rely on the Content-Length (I'm not sure that the Content-Length is required and is guaranteed to be in the text).

The following code reads the entire socket and can be applied to all data types including binary (without the use of extraneous libraries.

 try (InputStreamReader reader = new InputStreamReader(input stream)) { readAll((reader)); } catch (IOException e) { e.printStackTrace(); } private static String readAll(Reader inputReader) throws IOException { final char[] buffer = new char[1024]; final StringBuilder result = new StringBuilder(); while (true) { int byteRead = inputReader.read(buffer, 0, buffer.length); if (byteRead < 0) return result.toString(); result.append(buffer, 0, byteRead); } 

}}

You can also read using a scanner.

 Scanner s = new Scanner(inputStream).useDelimiter("\\A"); String result = s.hasNext() ? s.next() : ""; 

In general, for me, a simple Coy is preferable since it is harder to make a mistake.

  • When is easier, of course better. But for an unknown reason, your example reads the response from the server for a very long time. And in principle, when there is no Content-Length headers, you have to read the entire answer from the server for a lot longer than when there is a Content-Length In general, as if this is the whole question. - or_die