PHP strange encoding behavior

Question

There are 3 files: index.html - a form for data entry, or just to go to index.php - a script that does something and sends the result to output.html - a template for outputting the result of the script.

The essence of the problem in the picture:

The script performs a learning task, namely, swaps each pair of characters. With English words everything works, the problem is in Russian. Solution method:

function todo(){ $src = "Почему один и тот жетекст отображается по-разному?"; //$target = utf8_decode($src); $target = $src; $pos = 0; $tempstr = $target; //echo strlen(utf8_decode($target)) . "<br>"; while(true){ if($pos + 2 < strlen(utf8_decode($target))){ $a = substr($target, $pos, 1); $b = substr($target, $pos + 1, 1); $tempstr = substr_replace($tempstr, $a, $pos + 1, 1); $tempstr = substr_replace($tempstr, $b, $pos, 1); $pos += 2; }else{ break; } } return "initial: \t" . $target . "<br>" . "Result: \t" . $tempstr; }

Just in case the "template":

 <!DOCTYPE html> <html><title><?=$title?></title> <body> <form action="index.php" method="post"> <input type="submit" name="submit" value="RELOAD" /> </form> <font size="5" color="blue" face="Calibri"><pre> <?php if($result): ?> <span style="color:blue; font-weight:bold">Результат работы скрипта <?=$scriptName;?>:<br><?=$result;?></span> <?php else: ?> <span style="color:red; font-weight:bold"> Пусто.. <img src="megusto.png" alt="me gusto"> <?php endif; ?> </span></font></pre> </body> </html>

In chrome is the UTF-8 encoding. Notepad ++ has UTF-8 encoding and the "auto-detect character encoding" option is disabled. Also, as seen in the comments, I tried to use utf8_decode - to no purpose. Judging by the conclusions of echo through each line, I get the feeling that the substr somehow distorts $ target

DECISION

I used the advice, and solved the problem with mb_strlen, mb_substr. However, something had to change:

 function todo(){ echo "<pre>"; $target = "велосипедный костыль-костыльный велосипед"; echo "target: \t" . $target . "<br>"; $res = ""; $pos = 0; while(true){ if($pos > mb_strlen($target)) break; $a = mb_substr($target, $pos, 1); $b = mb_substr($target, $pos + 1, 1); $res .= $b . $a; $pos += 2; } echo "result: \t" . $res; echo "</pre>"; }

Firstly, it turned out that for multibyte strings there are a bunch of different interesting functions, but there is no analog function substr_replace. So I had to make a string, and not replace. Secondly, for the mbstring to work, you must connect the extension = php_mbstring.dll module, which is in the php.ini file - initially it is disabled (at least for me). In any case, thanks to those who retreated, such a simple task and I learned so much about PHP

Because in UTF-8 Russian letters are encoded with 2 bytes ...
@Torin, look towards functions with multibyte strings php.net/manual/ru/ref.mbstring.php

KAGG Design KAGG Design 20.4k 3 12 44 · Accepted Answer · 2016-12-01T03:03:16

To work correctly with multibyte strings, use mb_strlen and mb_substr.

KAGG Design

20.4k 3 12 44

thanks, it turned out informative - torin.dmitry
Address, it is pleasant to deal with grateful people. It's not always like that here)) - KAGG Design

|

Visman Visman 16.2k eight 21 52 · Answer 2 · 2016-12-01T03:20:22

Here is where you can show the advantage of regular expressions over string functions:

 <?php $input = "Почему один и тот жетекст отображается по-разному?"; $output = preg_replace('%(.)(.)%us', '$2$1', $input); var_dump($input, $output);

Result:

 string(92) "Почему один и тот жетекст отображается по-разному?" string(92) "оПечумо ид н иот тежетск ттобоаражтеясп -оарнзмо?у"

About regular expressions in php here .
Preg_replace () function

This is a beautiful solution, but thanks to my ugly I learned a lot about encodings

PHP strange encoding behavior

2 answers 2

More articles: