For example, I put the Windows-1251 (Cyrillic) encoding in an HTML document, enter the string "Egor" and use JavaScript to display the code for this character:

var str = 'Егор'; str.charCodeAt(0);//1056 

The result is 1056 . I look at the symbol table and I can not understand where this code came from.

I am looking for information - everywhere only tables, and basic concepts (what is encoding).

Question: how is it all interconnected and working?

    2 answers 2

    In order.

    1. Reading the theory on the function charCodeAt () we learn the following:

    All strings are internally encoded Unicode .

    It doesn't matter what language the page is written in, whether it is in windows-1251 or utf-8. Inside the JavaScript interpreter, all strings are reduced to a single “unicode” form. Each character has its own code.

    Further.

    1. Studying the Unicode table itself and its organization (useful: Cyrillic characters , a more visual table ), we learn that the symbol of the Cyrillic letter "E" is in the position U + 0415 (0x415 in hexadecimal).

    2. All of the above is confirmed by the result of the "Е".charCodeAt(0) , which gives 1045 , not 1056 as you have. 1045 (dec) = 415 (hex)

    3. We look in the coding table for row 0410 and column 5 . At the intersection is our letter Е

     alert("Егор".charCodeAt(0)); 

    • and 1056 then where did it come from? - Sergey
    • @Sergey have no idea how this can be achieved. Added an example in response. - slippyk
    • I expose the <meta charset = "windows-1251">, and display 1056 code, if I use utf8 - output 1045. - user190134

    Judging by the result, you saved the file in UTF-8 encoding. In this encoding, the string Егор corresponds to a sequence of bytes d0 95 d0 b3 d0 be d1 80 .

    The browser seeing the <meta charset=windows-1251> interpreted this sequence as 8 characters Егор . The first one is the Russian letter “P”, and it has the code 1056.