Hey.

Question about fread() function in PHP . The second argument is the number of bytes that must be read in the file.

As an example, I wrote the text 1Hello in the data.txt file, 1Hello UCS-2 encoding. If I look at this file in the hex editor, I’ll see it enter image description here

I also wrote a script in the 1.php file with UTF-8 encoding, which reads from the file by ONE byte.

 <?php $q=fopen("data.txt","r"); for($i=1;$i<=filesize("data.txt");$i=$i+1){ echo "$i: ".fread($q,1); echo "<br/>"; } ?> 

Brought this:

enter image description here

I deliberately set the two-byte encoding to see, and what will I get if I get to the "middle" of encoding the ordinal number of a character with the fopen function. I do not understand what I brought. What encoding is it derived from? What are these points between the letters inserted? They can be seen in the picture.

    2 answers 2

    The fread () function reads data byte-by-byte . Here

     fread($q,1) 

    each time you read 1 byte and output it to the screen.

    What encoding is it derived from?

    The conclusion you had was in ISO 8859-9 or Windows-1254 encoding judging by the first two characters with the codes xFE and xFF (Are you a Turkish spy? :))

    What are these points between the letters inserted?

    These are bytes that do not correspond in the displayed characters to the encoding with which the program works. In the first picture of your program supports the display of only characters with codes from x20 to x7F (the rest is displayed in dots). On the second picture, symbols x00 are displayed in empty places.

    • Thanks for the answer, put a plus. the fact that the fread () function reads as many bytes as needed (indicated in the second parameter) is what I know. each time I read one byte and the cursor moved one byte. I know that too. I specifically set the two-byte UCS-2 encoding in the data.txt file in order to get into the "middle" of the character and get some nonsense in the output. I want to understand why I got exactly what I got. The first two bytes are BOM. They were not derived in the Turkish extended encoding 1254, I looked - there are other characters for xFE and xFF - Dimon
    • @Dimon, if you do not believe that the browser has defined the encoding as Turkish, open in the browser the View - Text Encoding (or similar) menu and see what is selected there when displaying the page from the second image. - Visman
    • Indeed, the encoding is Turkish. it is not clear why she was automatically exposed - Dimon

    What is really incomprehensible here is how the question relates to PHP and the fread function.

    I do not understand what I brought.

    Separate bytes.

    What encoding is it derived from?

    NO IN WHAT. Bytes are not encoded. The browser displays the symbol (if it can), the binary code of which was transferred to it. Taking into account the encoding specified in the HTTP header.

    What are these points between the letters inserted?

    This is exactly what you wanted to see: the very "nonsense", "if you get into the" middle "of the symbol." That is, you got exactly what you wanted, but still not happy.

    The dot hex editor displays a zero byte, so as not to confuse it with a space.

    • thanks for the answer. plus separate bytes were derived in some kind of encoding. The question is, in what encoding are they derived? these points are visible only in the hex editor. In the browser, in the source code, just empty spaces (like spaces) are visible - Dimon
    • one
      The answer should not plyusovat and read. everything is written in it - Ipatiev