Faced a strange reaction of regulars in R: R considers some letters of the Cyrillic alphabet as punctuation and starts to work inadequately. Question: how to overcome this oddity to remove single letters from the text? The code below shows this problem:
gsub(pattern = "\\b[:alpha:]{1}\\b",replacement = " ",x = " - 1 , очковый оцинкованный ёж з z ZZ 123",ignore.case = T) gsub(pattern = "\\b[a-zа-ячё]{1}\\b",replacement = " ",x = " - 1 , очковый оцинкованный ёж з z ZZ 123",ignore.case = T)
[:alpha:]should be written as[[:alpha:]]. The second regular season can also be written as"\\b[a-zа-яё]\\b"(чalready included in the"\\b[a-zа-яё]\\b"range). - Wiktor Stribiżew