@platedz , whatever PCP.
The utf-8 characters must be translated into ucs codes, and they (if possible) in cp1251. Naturally, not all ucs (for example: latin-1, pseudographics, hieroglyphs, etc.) can be translated into cp1251.
How to translate utf-8 to ucs.
We look at the first (sign) bit of the utf-8 byte. If it is 0, then the ucs code is equal to the value of this byte (this is ascii).
If the first two bits are 10 or the byte value is 0xff or 0xfe, then this is an error in utf-8.
Now analyze the high byte bits. We select several 1, and then one 0. The number of units is equal to the number of utf-8 bytes encoding ucs. The rest of the byte is the high bit of the encoded ucs. In this case, all the following bytes of this symbol must begin with 10 and the remaining 6 bits encode the next part of ucs.
All Cyrillic is encoded with 2 utf-8 bytes. For example, the Russian A (ucs code 0x410) in utf-8 is 2 bytes 0xd0 0x90
1101 0000 1001 0000 Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΡΠ°ΠΊ (ΡΠ»Π΅Π²Π° Π²ΠΈΠ΄ΠΈΠΌ 110, Π·Π½Π°ΡΠΈΡ Π²ΡΠ΅Π³ΠΎ Π±ΡΠ΄Π΅Ρ 2 Π±Π°ΠΉΡΠ° Π² utf-8) 110 10000 10 010000 Π²ΡΠ΄Π΅Π»ΠΈΠΌ 11 Π±ΠΈΡ (5 ΠΈΠ· ΠΏΠ΅ΡΠ²ΠΎΠ³ΠΎ ΠΈ 6 ΠΈΠ· Π²ΡΠΎΡΠΎΠ³ΠΎ Π±Π°ΠΉΡ) ΠΈΠ· ΠΊΠΎΡΠΎΡΡΡ
ΡΠΎΡΠΌΠΈΡΡΠ΅ΠΌ ucs 10000010000 ΠΈΠ»ΠΈ ΡΠ°Π·Π±ΠΈΠ² Π½Π° ΠΏΠΎΠ»ΡΠ±Π°ΠΉΡΡ 100 0001 0000 Ρ.Π΅. 0x410
Another example is the symbol No.
β Π² utf-8 0xe2 0x84 0x96 1110 0010 1000 0100 1001 0110 0010 00 0100 01 0110 0010 0001 0001 0110 == 0x2116
In fact, it is faster to write a program (I find it easier to use C, but you are interested in PCP) than to explain it in Russian.
For 2 bytes in str [], getting the first 5 bits in b1, and the last 6 bits in b2
int b1, b2, ucs; b1 = str[0] & 0x1f; b2 = str[1] & 0x3f; ucs = (b1 << 6) | b2;
or if there are no bit operations in PCP (I hope there is a remainder of the division), then
b1 = str[0] % 32; b2 = str[1] % 64; ucs = b1*64 + b2;