There is a code in PHP that translates texts from Ukrainian into Russian and vice versa, by replacing the words of one language with the corresponding words of another:
$oldstring = 'Мова, з якої здійснюється переклад'; $words = array(array('ua'=>'з', 'ru'=>'с'), array('ua'=>'здійснюється', 'ru'=>'осуществляется'), array('ua'=>'мова', 'ru'=>'язык'), array('ua'=>'переклад', 'ru'=>'перевод'), array('ua'=>'якої', 'ru'=>'которой')); foreach ($words as $row) { $fndrep[$row['ua']] = $row['ru']; } $pattern = '~(?=([\x{0410}-\x{042F}]?)([\x{0430}-\x{044F}]?))\b(?i)(?:' . implode('|', array_keys($fndrep)) . ')\b~u'; $newstring = preg_replace_callback($pattern, function ($m) use ($fndrep) { mb_internal_encoding('UTF-8'); $lowm = $fndrep[mb_strtolower($m[0])]; if ($m[1]) return ($m[2]) ? mb_strtoupper(mb_substr($lowm, 0, 1)) . mb_substr(mb_convert_case($lowm, MB_CASE_LOWER), 1, mb_strlen($lowm)) : mb_strtoupper($lowm); else return $lowm; }, $oldstring); echo $newstring; // получаем "Язык, с которой осуществляется перевод"
The code works, but a number of problems remain:
- The most important thing is: although, in general, from Ukrainian to Russian or vice versa can be translated literally, but there are, of course, many cases where the context needs to be taken into account, and the output requires a completely different word or at least a different case of the word;
Take an example from our code:
because ukr. The wordмова
is feminine, and in Russian the word corresponding to it is masculine, then at the output we get theЯзык, с которой осуществляется перевод
(although there should be "the language from which ...").
Hence the question: what should be corrected in the code so that whole words (for example,мова|язык
) can be entered into the base of words, and at the same time expressions (for example,мова з якої|язык с которого
)? And, if the code finds not just a whole word, but an integer expression, then it uses it when translating preg_replace_callback()
in the above code eliminates the need to add the same word with a capital letter and a small letter to the database, receiving the output of the translated word in the register in which it is written in the source text. But there are glitches, when for some reason a word on the output for some reason displays only in BIG letters, although in the source text only the capital letter is its first letter.- The code does not understand the words with a hyphen. For example, the Ukrainian word
будь-який
code divides into two parts: translates the Ukrainian. the wordякий
in Russian is likeкоторый
, - and we getбудь-который
, although they should - The problem with the Ukrainian letter
і
(corresponds to Russianи
): if in the source text the word begins with a largeІ
, then when translated at the output it is for some reason smallи
(in all words that start with this letter: i.e. internet, information and etc.) - The code does not understand abbreviations with a dot. For example,
т.д.
т.е.
or similar (solving this problem, you need to take into account that the words at the end of the sentence must be translated as whole words without a dot)
Google Translate API is available as a paid service
- stckvrw