How to correctly read the response from the HTTP server via Sockets?

Question

I understand with sockets in Java and for one with the HTTP protocol. I send a request to the site in order to get its contents. The server returns the response body in gzip.

In general, the problem is that the data is read byte for a very long time, from 2 to 5 minutes.

c = new Socket("example.com", 80); PrintStream out = new PrintStream( c.getOutputStream() ); out.println("GET / HTTP/1.1"); out.println("Host: example.com"); out.println("User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0"); out.println("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); out.println("Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3"); out.println("Accept-Encoding: gzip, deflate"); out.println("Connection: keep-alive"); out.println(""); out.flush(); InputStream in = c.getInputStream(); byte[] buffer = new byte[8192]; // ByteArrayOutputStream используется, как накопитель байт, чтобы // потом превратить в строку все полученные данные. // преобразовывать часть потока в строку опасно, т.к. // если данные идут в многобайтной кодировке, один символ может // быть разрезан между чтениями ByteArrayOutputStream baos = new ByteArrayOutputStream(); // InputStream.read( byte[] ) возвращает количество прочитанных байт // и -1, если поток кончился (сервер закрыл соединение) for ( int received; (received = in.read( buffer )) != -1; ) { // записываем прочитанное из потока, от 0 до количества считанных baos.write( buffer, 0, received ); } // преобразуем в строку ( кодировку желательно указывать ) String reply = baos.toString( "UTF-8" ); // можно так, но toByteArray() создает копию массива, а я у мамы оптимизатор //String reply = new String( baos.toByteArray(), StandardCharsets.UTF_8 ); System.out.println( reply ); c.close();

If I read through BufferedReader, then only headers from the web server come back. But much faster than the previous method, almost instantly.

 BufferedReader br = new BufferedReader(new InputStreamReader(in)); while(true) { String line = br.readLine(); if(line.isEmpty()) break; }

Tell me how to correctly read the response body from the server?

The first method is designed for the fact that by giving the page the server closes the connection (i.e. http 1.0 or Connection: close header), although it is strange that it takes you minutes, my server turns off seconds after 5. The second method stops the cycle, encountering empty line, which, according to the standard, the title is separated from the content.
Than you are not satisfied with the standard HttpURLConnection and other http-clients on java?
InputStream in = c.getInputStream(); byte[] bytes = IOUtils.toByteArray(in); System.out.println(new String(bytes, "UTF-8")); c.close();

Accepted Answer · 2017-02-01T14:05:20

After receiving an empty string through the BufferedReader, you should NOT do a break, but also read the number of bytes indicated in the site response header: Content-Length: xxx - this will be the body of the response.

 import java.net.*; import java.io.*; public class HelloWorld{ public static void main(String []args) { try { Socket c = new Socket("example.com", 80); PrintStream out = new PrintStream( c.getOutputStream() ); out.println("GET / HTTP/1.1"); out.println("Host: example.com"); out.println("User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0"); out.println("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); out.println("Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3"); out.println("Accept-Encoding: identity"); out.println("Connection: close"); out.println(""); out.flush(); InputStream sin = c.getInputStream(); //InputStream sin = Socket.getInputStream(); DataInputStream in = new DataInputStream(sin); //BufferedReader br = new BufferedReader(new InputStreamReader(in)); String line = ""; String str = ""; Integer len = 0; while(true) { line = in.readLine(); if (line.indexOf("Content-Length") != -1) { len = Integer.parseInt( line.split("\\D+")[1] ); System.out.println("LINEE="+len); } str = str + line + '\n'; if(line.isEmpty()) break; } int i = Integer.valueOf(len); String body= ""; System.out.println("i="+i); if (i>0) { byte[] buf = new byte[i]; in.readFully(buf); for (byte b:buf) body = body + (char)b; } System.out.println(str); System.out.println(body); c.close(); } catch (Exception x) { x.printStackTrace(); } } }

View code online: http://www.tutorialspoint.com/compile_java8_online.php?PID=0Bw_CjBb95KQMR1p4ODN2MjI5d3c

IL Mare IL Mare 184 one five · Answer 2 · 2017-02-03T09:51:56

I do not quite see the point in writing code for reading only html. Moreover, to read everything in such a complicated way and rely on the Content-Length (I'm not sure that the Content-Length is required and is guaranteed to be in the text).

The following code reads the entire socket and can be applied to all data types including binary (without the use of extraneous libraries.

 try (InputStreamReader reader = new InputStreamReader(input stream)) { readAll((reader)); } catch (IOException e) { e.printStackTrace(); } private static String readAll(Reader inputReader) throws IOException { final char[] buffer = new char[1024]; final StringBuilder result = new StringBuilder(); while (true) { int byteRead = inputReader.read(buffer, 0, buffer.length); if (byteRead < 0) return result.toString(); result.append(buffer, 0, byteRead); }

}}

You can also read using a scanner.

 Scanner s = new Scanner(inputStream).useDelimiter("\\A"); String result = s.hasNext() ? s.next() : "";

In general, for me, a simple Coy is preferable since it is harder to make a mistake.

But for an unknown reason, your example reads the response from the server for a very long time.
And in principle, when there is no Content-Length headers, you have to read the entire answer from the server for a lot longer than when there is a Content-Length In general, as if this is the whole question.

How to correctly read the response from the HTTP server via Sockets?

2 answers 2

More articles: