From a java-servlet I send a request to a third-party GET web server, I get the html code in response, I stuff this code into a string. I look at the line-a Russian letters in it in the form of diamonds with questions. In the html document itself, the encoding is not specified anywhere. Therefore, manually re-encode the string in UTF-8. Did not help.

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException { String url = "http://meteonovosti.ru/index.php?index=8&value=26063"; URL obj = new URL(url); HttpURLConnection connection = (HttpURLConnection) obj.openConnection(); connection.setRequestMethod("GET"); BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); String inputLine; StringBuffer response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } in.close(); String answer = response.toString(); System.out.println("Оригинальный ответ: "+answer); answer = new String(answer.getBytes(), "UTF-8" ); System.out.println("После перекодирования: "+answer); 

Both outputs are the same (missing pieces of html code):

Original answer: - :

After recoding: - :

In the Eclipse settings, it is indicated to use UTF-8 everywhere. What do I need to do to get Russian letters?

    1 answer 1

    Just the text does not come in UTF-8. You create a string from a byte array, the byte array does not know anything about the encoding, it’s just a set of bytes, to make a string out of it, you need to specify the encoding

     answer = new String(answer.getBytes(), "UTF-8"); 

    You specified UTF-8, but the source text was converted to a byte array in a different encoding. In which? I followed your link http://meteonovosti.ru/index.php?index=8&value=26063 , opened the developer console in the browser, there you can see various http-headers in the Network tab, one of them was

     Content-Type: text/html; charset=koi8-r 

    So your Russian text was presented in the koi8-r encoding, therefore you need to initialize the string like this:

     answer = new String(answer.getBytes(), "KOI8-R"); 

    UPD.

    I apologize, hurried, you need to specify the encoding for the InputStream , and not for the string being created.

     BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream(), "KOI8-R")); 

    UPDDD.

    Picture how it looks in the browser. Usually, you need to press F12, you will see a similar half-window, then go to the page you need in the address bar and on the Network tab you will see all http requests, usually the main request is at the top. My screenshot is from Firefox, in Chrome everything looks similar.

    here

    • Unfortunately, now Russian letters look like this: "© ╫ About © ╫ About © About © ╫". - Alisa Korn
    • @AlisaKorn, I apologize, added the answer - iksuy
    • It worked! Thank you very much! True, I still can not find a place where the encoding would be indicated. - Alisa Korn
    • @AlisaKorn, attached a screenshot - iksuy
    • Thank you found. But a new riddle appeared: in Response Headers, the encoding is sent "charset = windows-1251", as I see it. - Alisa Korn