How does the encoding and character table

Question

For example, I put the Windows-1251 (Cyrillic) encoding in an HTML document, enter the string "Egor" and use JavaScript to display the code for this character:

var str = 'Егор'; str.charCodeAt(0);//1056

The result is 1056 . I look at the symbol table and I can not understand where this code came from.

I am looking for information - everywhere only tables, and basic concepts (what is encoding).

Question: how is it all interconnected and working?

Answer 1 · 2016-05-27T06:10:58

In order.

Reading the theory on the function charCodeAt () we learn the following:

All strings are internally encoded Unicode .
It doesn't matter what language the page is written in, whether it is in windows-1251 or utf-8. Inside the JavaScript interpreter, all strings are reduced to a single “unicode” form. Each character has its own code.

Further.

Studying the Unicode table itself and its organization (useful: Cyrillic characters , a more visual table ), we learn that the symbol of the Cyrillic letter "E" is in the position U + 0415 (0x415 in hexadecimal).
All of the above is confirmed by the result of the "Е".charCodeAt(0) , which gives 1045 , not 1056 as you have. 1045 (dec) = 415 (hex)
We look in the coding table for row 0410 and column 5 . At the intersection is our letter Е

 alert("Егор".charCodeAt(0));

I expose the <meta charset = "windows-1251">, and display 1056 code, if I use utf8 - output 1045.

Alexey Ten Alexey Ten 1,377 2 6 eight · Answer 2 · 2016-05-28T10:09:08

Judging by the result, you saved the file in UTF-8 encoding. In this encoding, the string Егор corresponds to a sequence of bytes d0 95 d0 b3 d0 be d1 80 .

The browser seeing the <meta charset=windows-1251> interpreted this sequence as 8 characters Р•РіРѕСЂ . The first one is the Russian letter “P”, and it has the code 1056.

How does the encoding and character table

2 answers 2

More articles: