There is some text:

Здравствуйте, Helpdesk. 😉 😉 © 😉 ™ С уважением, 

How can ruby ​​clean it up from all unnecessary characters (such as emoticons and other things) so that only pure text is left (letters, punctuation marks, spaces, line breaks, tabulation)?

    2 answers 2

    The String#delete method exists approximately for this: it takes as an input patterns for characters and cuts from the string those characters that fit all at once .

     "hello".delete "l","lo" #=> "heo" "hello".delete "lo" #=> "he" "hello".delete "aeiou", "^e" #=> "hell" "hello".delete "ej-m" #=> "ho" 
    • jm is the letters jklm , this is the range
    • ^ at the beginning of the pattern this is a negation, so that “everything except” matches the pattern
    • the order of characters in the pattern is not taken into account

    Keep in mind that it is not destructive, it will return a copy of your string. To convert "on the spot" you need to delete! The arguments are identical.

    All you have to do is formulate the definition of "clear text" (excluding smiles from it is dangerous 😉) as patterns for this function.

    • and how dangerous to exclude smiles? The problem is that the text with these characters is not inserted into the database: _Mysql2 :: Error: Incorrect string value: '\ xF0 \ x9F \ x98 \ x89 \ xC2 ...' for column 'subject - S.Ivanov
    • one
      @ S.Ivanov users will be against it, which is why You don’t need to repair user input, but rather a base :D Set UTF-8 encoding for fields in it (collation if desired, I advise utf8_bin , but test on your needs before accepting decision). - D-side

    specifically delete these characters like this:

     str = '😉 © ™' a = str.tr('😉','').tr('©','').tr('™','') b = str.gsub(/[\u{1F609}\u2122\u00A9]/,'') 

    or if you leave only Russian / English letters + numbers and spaces then like this

     a = str.gsub(/[^\p{L}\s\d]/,'') 

    remove characters so that your string is recorded in db mysql UTF-8 with utf8_general_ci like this:

     c = str.gsub(/[\u{10000}-\u{FFFFF}]/,'')