There is for example an array $array = array('blacksite.com',...); which indicates prohibited sites that need to be filtered from the text, for example, replace from to {Link deleted} and you can check for example through regexp and after passing in_array() , but knowing that there is such protection you can write blacksite*com and blacksite com or with a space or * after any letter in the word blacksite , how is it more correct and better \ safer to filter?

UPD: Thank you all! I will accept all the answers in use, even inclined to regexp as a little better but a little more expensive

  • some of the letters may also be written in Russian and mark in the comments so that the letters are reprinted. They can also use the services of reducing links or ip addresses. all don't filter out anyway. First you need to specify the task, describe all possible spellings. after which it’s already done a regular season - Mike
  • Maybe it's better to put a spam tag instead of a heap of tags with filters? - Visman
  • Yes, the fact is that it perfectly solves the problem if the site is not in the list of prohibited ones, but if there is one and they write a sufficient number of records? and the user will go and enter there something else? there must be some kind of protection even from simple spelling without spaces, and * etc - Red Woolf

2 answers 2

A little bit offtop.

Still, I do not agree with the answer @Naumov. I think it's better to use regular expressions to filter blacklists. Especially their (functions) execution in PHP 7 is greatly accelerated.

Here is a test example:

 function r_no($text) { $text = strtolower($text); // Текст теряет свой изначальный вид!!! $array = array( 'blacksite', 'blacksite1' ); $text = str_replace($array, '****', $text); return $text; } function r_yes($text) { $array = array( '%blacksite%i', '%blacksite1%i' ); $text = preg_replace($array, '****', $text); return $text; } $text = 'Есть к примеру массив $array = array(\'blacksite.com\',...); в котором указаны запрещённые сайты, которые нужно фильтровать из текста, например заменять из на {Ссылка удалена} и можно проверять к примеру через regexp и после проходом in_array(), но зная что есть такая защита могут написать и Blacksite*com и blackSite com или с пробелом или * после любой буквы в слове blacksite, как правильнее и лучше\надёжнее фильтровать?'; $start = microtime(TRUE); for ($i = 0; $i < 100000; $i++) { $kk = r_no($text); } echo "<pre>\n"; echo "Время str_replace: ", microtime(TRUE) - $start, "\n"; echo "</pre>\n"; $start = microtime(TRUE); for ($i = 0; $i < 100000; $i++) { $kk = r_yes($text); } echo "<pre>\n"; echo "Время preg_replace: ", microtime(TRUE) - $start, "\n"; echo "</pre>\n"; 

Result of performance:

 Время str_replace: 6.5153729915619 <-- ЖУТЬ О_о Время preg_replace: 0.18601012229919 

If I comment out the line $text = strtolower($text); // Текст теряет свой изначальный вид!!! $text = strtolower($text); // Текст теряет свой изначальный вид!!! , then I get the following result:

 Время str_replace: 0.16600894927979 Время preg_replace: 0.18501091003418 

PS Feel free to use regular expressions for this task;)

  • I'm using PHP 7, thanks for the reply)) - Red Woolf
  • Increased the number of characters to 10,000 and the preg_replace really handles the truth faster - Red Woolf
  • sandbox.onlinephpfunctions.com/code/… you are cunningly a little, estimating the algorithm by time, as this is fundamentally wrong, because many factors affect the time of the operation, power of the computer, processor load, allocated amount of RAM, or script and see what time jumps. To evaluate the algorithm, you can analyze the data and the number of operations on them. In my case, this is a straight-line function; in the case of a regular, this is an exponent of the length of the string. - Naumov
  • @Naumov, even if this is a feature of my computer, and not an online emulator, you still get rid of writing code for a filter on string functions compared to regular expressions: P - Visman

I would do that

in the array with forbidden links, we write that the link without http: , https and domain and replace with simple str_replace

 $arrayBlock = array( 'blacksite', 'blacksite1' ); $text = str_replace($array,'****',$text); 

as a result, we get that totipakti http://***.com , *** .com , ****com etc. It is played in this way most of the cases except when writing blacksite . com blacksite . com but this case can also be beaten for example

 $text = str_replace(" ","",$text); if(strpos($text,$blacksite) !== false) { echo 'forbidden'; die; } 

those. simply forbidding to publish any messages containing prohibited links. For a high-loaded project, it is better to use an array of symbols for synetization, and after this strpos if it is not a highly loaded project, you can use a regular for synetization and comparison.

  • You can write Blacksite.com in a message completely freely, and blackSite.com and the person will use the links to the right site, and str_replace or strpos will not save them! Again, if you check when you save the message, and not during the output, not so much regular workload is added to the rest of the text checks. - Visman
  • one
    The @Visman regular is recursively symbolic. Unfortunately in php it is not an operator but a function. If you look within a string of 50 characters, this is normal. But if we take the characters more, let it be 300-400, then we will get a significant consumption of memory and CPU time. Well, your remark is constructive and is solved by the problem strtolower - Naumov