On a multilingual site, you need to check one field for the correctness of the entered data. The string must consist of letters only. How to capture Russian, German, French, English and Spanish with a regular expression?

/ ^ [^ \ W] + $ / i - does not work.

UTF-8 encoding.

PHP5

  • Unicode support depends on the implementation of regular expressions. Can you tell me which language / library is used? - kirelagin
  • Php5 language - Viktor
  • one
    Oh, everything is very difficult there. Now I will try to answer in detail. To begin with: \ w matches any alpha character. \ W matches anything other than an alpha character. Feel it? Your regexp coincides with a bunch of all sorts of rubbish! - kirelagin
  • That is why I ask. I know that this regular is not right. - Viktor
  • No, everything is true, it is very difficult ... There is no power to describe everything in detail. If you have any specific questions, ask. - kirelagin

4 answers 4

  1. u needed modifier
  2. /^(?>\pL\pM*)+$/u

  • \pL matches any letter: n, w, á
  • \pM* any number of modifiers - this is necessary, since the letter á can be written as one character, or it can be written as two: á.
  • This expression matches the sequence of valid Unicode characters. According to the languages, you understand, do not filter. You can only add a cunning expression that will coincide alphabetically - Cyrillic and Latin (well, to exclude hieroglyphs). - kirelagin
  • Fine! Works. Thank. Pro languages ​​is not so important. - Viktor
  • one
    I changed in regular expression: to>. So it is more correct. - kirelagin

I solved for many languages ​​approximately the following scheme:

$ lang [] = 'abvgdeezhziklmn ... eyyabab ... Eyyu'

$ lang [] = 'abcde ... ĂźABCDE ...';

...

$ lang [] ...

preg_ match_all ('/ [a-zA-Z'.implode (' ', $ lang).'] / is', $ str, $ out);

And this is the only scheme that worked with a bang, in all the others there were shoals with an understanding of any symbols of the language.

  • It simply means that you are not patient enough to understand the essence of what is happening. Naturally, you can always invent a crutch. But if you figured out how Unicode works in rehexps, life would be much more fun;). - kirelagin
  • When there is time, then I spend it on such things, and when the project needs to be handed over "yesterday", then there is no time and you have to pervert :) - Alex Silaev

There are filters on php, dig in this direction, the article was on habr. In essence, these are regulars wrapped in functions. Most likely there is a filter on the letters.

    As I understand it, the problem is solved. So just after.

    / ^ [^ W] + $ / i - does not work.

    So it seems and should not. This is either a search for strings from non- W , or not from W and \ . In [] you need to specify characters, ranges or classes. I myself did not deal with php, but Google suggested:

    POSIX Character Class Definition Meaning [: digit:] [: alnum:] [: alpha:] [: blank:] [: xdigit:] [: punct:] [: print:] [: space:] [: graph: ] [: upper:] [: lower:] [: cntrl:]

    • @kirelagin Please be always polite. If you disagree with the participant's opinion, vote against his response, or beep with an alarm. - Nicolas Chabanovsky ♦
    • @ HashCode just to vote is usually not enough - it’s necessary to explain the error and suggest ways to fill in the missing knowledge in the area under discussion. It seemed to me, I did both, and within the bounds of decency. - kirelagin
    • The flame touches me a little. But what I was trying to point out - the difference between the "\ xxx" and the character classes "[]" is left out of the frame. Once upon a time this moment caused difficulties for myself. @ HashCode I'm new here and if I knew how - I would answer with e-mail - alexlz
    • @alexlz also touches me a little, but your apparent ignorance of what “\ w” means is touching. Please reply with an email. If you click on my signature, a page with information will open. I don’t know how the privacy settings are arranged here, but if you don’t see my email address there, then the site’s address is exactly visible, and it’s already pretty easy to find the address. Let's discuss regular expressions. - kirelagin