How to correctly (without violating the basic replacement algorithm) screw the following 2 conversions to the existing Ukrainian transliterator code in the Latin alphabet?

  1. Convert the letters "Є", "Ї", "Y", "Yu", "I" to "Ye", "Yi", "Y", "Yu", "Ya", respectively (but only in the case if these letters are in the first position in the word).
  2. Convert the combination of the letters "zg" to "zgh" in any position in the word.

In all cases, the register of the next letter must be taken into account so that something like “ZGUROVSKAYA - ZghUROVSKAIA” does not work out.

About the replacement algorithm:

The Ukrainian letters stand out in a separate dictionary, which are represented as several Latin characters, after which a dictionary (transliteration table) is created from pairs of characters, where for each of these letters (, F, X, C, H, W, U, I and I) there are variants with each of the lowercase letters of the Russian alphabet. Accordingly, it remains to first make a replacement for such pairs of characters, and then for all other characters that are not translated. Each of the letters corresponding to several Latin letters, followed by a lowercase character ([az]) is replaced by the Latin representation of this letter, followed by the same lowercase character (without transliteration). After that, respectively, the remaining lowercase letters are replaced separately, as, by the way, and capital. When replacing capital letters corresponding to several Latin characters, the upper () method is used for the Latin representation of the letter. In two cycles, where regular expressions are not used for the replacement, the replacement operation is performed using the string replace method.

https://gist.githubusercontent.com/popoff/3ba4e1fd259eec19163a81221dc8609d/raw/f39c6041427ec06beebb6674ef140eb06af28ce4/ukr_translit_v1.py

  • four
    Interesting story. Now formulate your question. - VladD
  • Formulated. Thank you - popoff
  • The algorithm for replacing individual characters can be replaced by translated = text.translate(dict(zip(map(ord, "ЄЖ"), ('Ye', 'Zh')))) . By the way, if the solution in the above answer worked for you, then tick the answer . - jfs

1 answer 1

If you work with transliteration, according to the official Cabinet document ( http://zakon4.rada.gov.ua/laws/show/55-2010-%D0%BF?test=4/UMfPEGznhhAUw.ZibfoNSdHI4cYs80msh8Ie6 ), then you have the wrong replacements. Still need to consider the position of the letter. And the presence of a hyphen. I can offer an approximate option (but without regular time, I have not yet mastered it).

 table = { 'А': 'A', 'Б': 'B', 'В': 'V', 'Г': 'H', 'Ґ': 'G', 'Д': 'D', 'Е': 'E', 'Є': 'Ye', 'Ж': 'Zh', 'З': 'Z', 'Зг': 'Zgh', 'ЗГ': 'ZGH', 'И': 'Y', 'I': 'I', 'Ї': 'Yi', 'Й': 'Y', 'К': 'K', 'Л': 'L', 'М': 'M', 'Н': 'N', 'О': 'O', 'П': 'P', 'Р': 'R', 'С': 'S', 'Т': 'T', 'У': 'U', 'Ф': 'F', 'Х': 'Kh', 'Ц': 'Ts', 'Ч': 'Ch', 'Ш': 'Sh', 'Щ': 'Shch', 'Ю': 'Yu', 'Я': 'Ya', 'а': 'a', 'б': 'b', 'в': 'v', 'г': 'h', 'ґ': 'g', 'д': 'd', 'е': 'e', 'є': 'ie', 'ж': 'zh', 'з': 'z', 'зг': 'zgh', 'и': 'y', 'i': 'i', 'ї': 'i', 'й': 'i', 'к': 'k', 'л': 'l', 'м': 'm', 'н': 'n', 'о': 'o', 'п': 'p', 'р': 'r', 'с': 's', 'т': 't', 'у': 'u', 'ф': 'f', 'х': 'kh', 'ц': 'ts', 'ч': 'ch', 'ш': 'sh', 'щ': 'shch', 'ю': 'iu', 'я': 'ia', 'Ь': '', "'": "" } def upperlower(z): if z[1] == 0: return table[z[0].lower()].lower() if z[1] == 1: return table[z[0].lower()].upper() def ua_in_lat(text): text_list = [] n = 0 for i in range(len(text)): if text[i].islower(): i1 = 0 else: i1 = 1 if i != len(text) - 1 and (text[i]+text[i+1]).lower() == 'зг': if text[i+1] == 'г': i1 = 0 text_list.append((text[i]+text[i+1], i1)) n = 1 elif n == 1: n = 0 continue else: text_list.append((text[i], i1)) if text_list[1][1]: text_lat_def = table[text_list[0][0]].upper() else: text_lat_def = table[text_list[0][0]] for i in text_list[1:]: text_lat_def += upperlower(i) return text_lat_def text_ua = 'ЗГУРОВСКАЯ-Згуровская-Язгуровская' text_lat_list = [] if '-' in text_ua: for z in text_ua.split('-'): text_lat_list.append(ua_in_lat(z)) text_lat = '-'.join(text_lat_list) else: text_lat = ua_in_lat(text_ua) print(text_lat) 
  • Thank! Yes, transliteration should be official, according to the current resolution of the Cabinet of Ministers. What specific replacements are wrong? Not sure I understood your thesis about the presence of a hyphen ... - popoff
  • I hurried about the hyphen, and you corrected the replacements in paragraph 1 (where the letters stand in the first position). - vdm_mar