I watch the manual, I see these flags

ENT_SUBSTITUTE ENT_DISALLOWED 

Explain the difference in principle, from the mana, I really did not understand

  • ENT_SUBSTITUTE Replaces incorrect code sequences with Unicode replacement U + FFFD in the case of UTF-8 and & # FFFD; when using a different encoding, instead of returning an empty string. ENT_DISALLOWED Replaces invalid character codes for the specified document type with a Unicode replacement symbol U + FFFD (UTF-8) or & # FFFD; (using a different encoding) instead of leaving everything as it is. This can be useful, for example, to ensure that the XML documents with embedded external content are formally correct. - ChromeChrome
  • the class! Only I read it in the manual and the text is the same for these flags, the difference is at the end, but it is very strange - MaximPro
  • @ChromeChrome, did not become clearer : D - Visman
  • So you try to hammer in a variable line with different characters and switch flags and look that gives out - ChromeChrome
  • @MaximPro, probably the difference is this ΠŸΡ€ΠΈ Π½Π°Π»ΠΈΡ‡ΠΈΠΈ Π²ΠΎ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΌ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π΅ string нСдопустимой ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ символов Π² Π·Π°Π΄Π°Π½Π½ΠΎΠΉ ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΊΠ΅ encoding Π±ΡƒΠ΄Π΅Ρ‚ Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π΅Π½Π° пустая строка, Ссли Π½Π΅ установлСны Ρ„Π»Π°Π³ΠΈ ENT_IGNORE ΠΈΠ»ΠΈ ENT_SUBSTITUTE. There is no talk about ENT_DISALLOWED . - Visman

2 answers 2

When using the ENT_SUBSTITUTE flag, ENT_SUBSTITUTE sequences that cannot be treated as UTF-8 characters will be replaced with FFFD, while using ENT_DISALLOWED , an empty string will be returned:

 <?php echo '<pre>'; var_dump(htmlspecialchars("a\x80b", ENT_SUBSTITUTE)); // string(5) "a b" var_dump(htmlspecialchars("a\x80b", ENT_DISALLOWED)); // string(0) "" 
  • By the way, what does "unqualified sequences" mean? - MaximPro
  • and the manual says that incorrect sequences are replaced with the ENT_DISALLOWED flag, but in your example this does not happen ... something is wrong here - MaximPro
  • @MaximPro UTF-8 characters have a certain structure, if it is a single-byte character, the first bit in it is 0, in a two-byte character the first three bits of the first byte are 110, and so on. You can see in the standard prefixes for three, four, five and six byte characters. If it suddenly turns out that you do not have the necessary prefix after the next byte, for example, the next byte does not begin with the 0th bit, but the 1st one, the character is considered to be broken and cannot be interpreted as a UTF-8 character (often such characters are obtained when Russian character consisting of two bytes is cut in half). - cheops
  • @MaximPro The manual is either translated crookedly or there is an error in it, because it says almost the same thing about two different flags that provide work in two different modes - replacing a broken character and returning an empty string if a broken character is found. - cheops

Just an example, for clarity:

 echo htmlspecialchars("<\x80The End\xef\xbf\xbf>", ENT_HTML5 | ENT_DISALLOWED | ENT_SUBSTITUTE, 'UTF-8'); \\ < The End > echo htmlspecialchars("<\x80The End\xef\xbf\xbf>", ENT_HTML5 | ENT_SUBSTITUTE, 'UTF-8'); \\ < The End > echo htmlspecialchars("<\x80The End\xef\xbf\xbf>", ENT_HTML5 | ENT_DISALLOWED, 'UTF-8'); \\ пустая строка 
  • As for |, clarify the situation for me, otherwise it always seemed to me that this is a logical operator or, although it is written as || ... I certainly know that there is a bitwise operator or it is written exactly as | In short, explain to me how in this context it works. PS Sorry for the offtopic - MaximPro
  • Well, I understood it all the same bitwise operation or, but tell me by what principle this or that flag will be selected here - MaximPro
  • @MaximPro, Π²Ρ‹Π±Ρ€Π°Π½ Ρ‚ΠΎΡ‚ ΠΈΠ»ΠΈ ΠΈΠ½ΠΎΠΉ Ρ„Π»Π°Π³ not Π²Ρ‹Π±Ρ€Π°Π½ Ρ‚ΠΎΡ‚ ΠΈΠ»ΠΈ ΠΈΠ½ΠΎΠΉ Ρ„Π»Π°Π³ here ΠΈΠ»ΠΈ , here the суммированиС all the flags specified in the function. - Visman
  • I know that this is a summation, but the operation is called ΠΈΠ»ΠΈ ... well, let's take your first example, will it mean that 3 flags will be simultaneously executed as the 2nd argument? ENT_HTML5, ENT_DISALLOWED, ENT_SUBSTITUTE - MaximPro