Good evening, there is a problem with regex and file encoding There is a php script

<? # Если не передан параметр if (!defined('UF_AUTOLOADER_SETUP')) die(header('Location: /404/')); # Переменные $message = urldecode($project['text']); # Проверка if(!preg_match('/^[^\.][0-9a-zA-Zа-яёА-ЯЁА\s\-\.\,\_\!\?\(\)]+[^\.]/', $message)) die(json_encode(Array('type' => 'error', 'errid' => '024'))); # Проверка if(empty($project['id']) || empty($message) || empty($project['captcha'])) die(json_encode(Array('type' => 'error', 'errid' => '004', 'critical' => true))); # Проверка на пустоту. $message = trim($message); if(!empty($message)){ # Замена [br/] $test = str_replace('[br/]', '', $message); $message = str_replace('[br/]', '%BR%', $message); # Проверка if($test == '') die(json_encode(Array('type' => 'error', 'errid' => '024'))); # Парсим пустые BB коды $confing = new confing(); if($confing->parser($message, 'check') == 0){ # Проверки на пустоту. if($confing->parser($message, 'remove') == '') die(json_encode(Array('type' => 'error', 'errid' => '024'))); # Пустые BB Коды } else die(json_encode(Array('type' => 'error', 'errid' => '024'))); # Замена %BR% $message = str_replace('%BR%', '[br/]', $message); # Добавляем комментарий $add = $db->insert("INSERT INTO `umbrella_forum_messages` (`uid`, `tid`, `fid`, `message`, `time`, `tm`) VALUES (?, ?, ?, ?, ?, ?)", Array($_SESSION['id'], $project['id'], $topic['fid'], htmlspecialchars($message), date("Ymd H:i:s"), 't')); # Ответ die(json_encode(Array('request' => 'forum', 'type' => 'newmessage', 'status' => true, 'mid' => intval($add), 'id' => intval($project['id']), 'overpage' => $page))); } else die(json_encode(Array('type' => 'error', 'errid' => '024'))); 

The message itself is transmitted via js (json format), the encoding in the JS file is UTF-8, the php file encoding is windows-1251, when sending a message, it gives me an error

 if(!preg_match('/^[^\.][0-9a-zA-Zа-яёА-ЯЁА\s\-\.\,\_\!\?\(\)]+[^\.]/', $message)) die(json_encode(Array('type' => 'error', 'errid' => '024'))); 

Tried to escape the encoding:

 $message = iconv("UTF-8", "windows-1251", urldecode($project['text'])); 

As a result, all Russian characters simply disappeared and an empty message went into the database ... What exactly am I doing wrong or what is the php code error?

  • It's 2016, are you still working from 1251? PS What do you want to check with this condition if(!preg_match('/^[^\.][0-9a-zA-Zа-яёА-ЯЁА\s\-\.\,\_\!\?\(\)]+[^\.]/', $message)) ? - Visman
  • Yes, I know that, utf-8 is now relevant to use ... preg_match I am trying to check the text so that there are no left symbols in the text, so that they do not stupidly write with spaces or use alt + 0160 ... - in1

1 answer 1

The fact is that preg_match, like preg_match_all, works in unix / linux mode and uses utf-8 (as a rule). One way or another, your check is hard =)))

I find it difficult to understand exactly what you are testing with this expression if(!preg_match('/^[^\.][0-9a-zA-Zа-яёА-ЯЁА\s\-\.\,\_\!\?\(\)]+[^\.]/', $message))

But try this:

if (! preg_match ('/ ^ [^.] [\ w \ s -. \,! \? ()] + [^.] / iu', $ message))

  • Thank you so much, Stanislav. It helped! - in1