I have in the source text ( Header("Content-Type: text/html; charset=windows-1251"); ) there is a textarea where the text is entered (Russian and English letters, numbers, line breaks and + are valid). There can be both hieroglyphs and Persian phrases and all sorts of special characters.

Problem: how to remove hieroglyphs that are converted into numbers.

I thought so. source text:

  sdfsdfsdf 早上 好 

 $sInputText=htmlspecialchars($sInputText,1251); 

convert hieroglyphs in essence: sdfsdfsdf 早上好

How to remove these entities?

I tried a regular program - for some reason it does not work

 $Text=preg_replace('/&(.+?);/','',$Text); 
  • The preg_replace () function takes 3 parameters, not 2 php.net/manual/ru/function.preg-replace.php - Visman
  • you are right- described, corrected. does not work - cuts out text only & - Prog2010
  • hmm, it removes everything from me &...; , and not just & . - Visman
  • The regular expression you use is correct and should do the expected result. The problem may be with the encoding of the text. What encoding does PCP use by default? - ReinRaus
  • at the beginning of the file I specify Header ("Content-Type: text / html; charset = windows-1251"); somewhere else need to specify the encoding? - Prog2010

2 answers 2

 $Text=preg_replace('/&(.+?);/','',$Text); 

change to

 $Text = preg_replace('/&(amp;)?(.+?);/', '', $Text); 

UPD

 $Text = preg_replace('/&(amp;)?#\d+;/', '', $Text); 
  • @ Prog2010, null no where to get from here, if you do not submit to the input null . - Visman
  • Now works but a little wrong. it turns out if in the text there is a phrase with the beginning in & and ending in; it will remove the example "& 1ea22 dsf;" - Prog2010
  • @ Prog2010, well, then look at my answer for a more accurate regular schedule. - Visman
  • thanks, all worked well. - Prog2010

Here such code works for me

 $text = "sdfsdfsdf 早上好"; $res = trim(preg_replace("|([&#0-9;]+?)|",'', $text)); var_dump($res); 

Sandbox

  • Regular expression is wrong. - ReinRaus
  • @ReinRaus, what are you? And it works for me. Do not know why? Magic maybe? - Roman Kozin
  • Maybe because your test text has too little data? Try on the text 12 обезьян & 1 я сидели на ветке; #ололо #ololo - ReinRaus
  • @ReinRaus, the example is taken from the question heading - Roman Kozin
  • one
    @ Prog2010, if you have amp left there, then you have some kind of dual encoding, because & this is the & . - Visman