Tell me how to make a simple and not very slow check of the text sent to the server for the presence of mats? Ie entered the user bad words and there is a check with a collection of bad words.
- You know about the low efficiency and false positives of such checks, right? - Kromster
- See, for example, how this is implemented in the Censure script, links: github.com/rin-nas/php-censure and forum.dklab.ru/viewtopic.php?p=166584 - Akina
- and as an option: bally-36.livejournal.com/66906.html - upward
1 answer
Well, it’s not for the first year that people have been fighting over this problem. I mean the biggest problem is the originality of people. Even if in the database of bad words add any kind of distortion in the spirit of "mlyat", people still begin to use some stars there, symbols. And in the end just come up with the original humiliation (although if the problem is a mat, then that's okay). But if you just need to check the word for a match with some kind of samples, then:
1) use split("\s+"); to break everything by words. And further it is possible to touch a stupid switch. More beautifully - loop through the array of mats. Even more beautiful is the binary search by hash. I do not know what methods HashMap uses there, so I will not advise you, but when they want speed, they use it.
2) A huge long regex that will look for mats. This option is more interesting. Regex, of course, needs to be formed from separate substrings so that the code does not score. It will look beautiful, but not sure about performance. Let them write in the comments, if my opinion is that it is slow - wrong.
And along the way, do not forget about all sorts of tricks, in the spirit of toLowerCase , in order not to fence more options with large letters. The list of words itself can also be automated, for example, prepositions + mat. Then there vowels replaced. Again, code, not hands. And as initialization, but not at each check. The check itself should be just a search through the possible options. And everything will fly)
And I forgot. The way you will do depends on the goal. If you have a serious application, then just a dozen or two mats will be stopped by 99% of people. Rather, as a rule, it will work, for people who, according to their emotions, wrote "well, you can’t mate, okay." If some mediocre forum, I would not bother either. But if a large children's resource, for example, is an important task for you + there are a lot of schoolchildren, it will have to be distorted there. And in general, I would go through the appointment of moderation, instead of all these problems.
- Tell me, is it necessary to create a separate stream for the first solution? - upward
- No no need. I doubt that a normally written function will work longer, say 100ms. And the user doesn’t notice this, you will probably lose more time at the start of the stream (dadad, the start of the stream also takes time)) - Uraty