Friends, I parse a single site using the library simple html dom. But this situation happened, maail is displayed on the site in this form

string(90) "mailto:%72o%6di%6b%2d%73%63%6f%72p%69%6fn@%79%61n%64%65%78.ru" 

If you look through the "Original Page View", then the a tag is written this way

 <a href="mail&#116;o:%72o%6d%69%6b%2ds%63or%70%69o%6e@y%61nd%65x.ru" title="">&#114;&#111;&#109;&#105;&#107;&#45;sc&#111;&#114;pio&#110;@&#121;&#97;&#110;d&#101;&#120;.&#114;&#117;</a> 

As I understand the name of the link is encrypted in ascii, but the link itself is for some reason different, although in both cases romik-scorpion@yandex.ru is encrypted

I have never come across this, can I somehow decipher them using php?

  • one
    Why is it incomprehensible, in% hex the number of the character in the ASCII table, and in & # dec-number of the character in the ASCII table - nick_n_a

1 answer 1

URL encoding :

The URL standard uses the US-ASCII character set. This has a serious drawback, since it is allowed to use only Latin letters, numbers and a few punctuation marks. All other characters must be recoded. For example, the Cyrillic letters, letters with accents, ligatures, hieroglyphs should be recoded. The transcode encoding is described in RFC 3986 and is referred to as URL-encoding, URLencoded, or percent-encoding.

To decode your link you must use urldecode

 <?php var_dump(urldecode('%72o%6d%69%6b%2ds%63or%70%69o%6e@y%61nd%65x.ru')); 
  • Here I am not a supporter of cruelty, but these parsers, like those who slavishly help them, would be impaled. - Ipatiev
  • @ Ipatiev, and how did they bring you? - Kostiantyn Okhotnyk
  • @ Ipatiev or ideological hostility? :) - Kostiantyn Okhotnyk