Random set in the wrong language, auto fix

Question

The function of correcting randomly typed characters in another language is necessary, in this case EN->RU , as well as deleting the letters Ё and Ё Here is what I wrote:

 string eng = "qwertyuiop[]asdfghjkl;'zxcvbnm,.QWERTYUIOP{}ASDFGHJKL:\"ZXCVBNM<>`~ёЁ"; string ru = "йцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮеЕеЕ"; for (int i = 0; i < eng.Length; ++i) if (query.Contains(eng[i])) query = query.Replace(eng[i], ru[i]);

I really do not like the code, who could help?

Bydlyansky code of some kind, is there a more elegant solution?
Create an array, where the index is equal to the wrong character code, and the value is correct (for example, the code "d" is 223, the code "q" is 113, because you need to change "q" to "d", then tab[113]=223 ), and replace it by it.
Of course, for the "correct" index is equal to the value, i.e.
Well, or the same, but in the form of a string literal, where the position is equal to the code of the wrong character, the character is correct - but it will be slower.

Accepted Answer · 2017-08-10T12:24:10

 public sealed class Replacer { private readonly Dictionary<Char, Char> _dictionary; public Replacer(String sourceSymbols, String targetSymbols) { if (sourceSymbols.Length != targetSymbols.Length) throw new NotSupportedException("sourceSymbols.Length != targetSymbols.Length"); Int32 count = sourceSymbols.Length; Dictionary<Char, Char> dictionary = new Dictionary<Char, Char>(count); for (int i = 0; i < count; i++) dictionary.Add(sourceSymbols[i], targetSymbols[i]); _dictionary = dictionary; } public void FixCharacters(ref String query) { if (String.IsNullOrEmpty(query)) return; if (String.IsInterned(query) == null) { FixNotInternedString(query); } else { FixInternedString(ref query); } } private unsafe void FixNotInternedString(String query) { Int32 index = query.Length - 1; fixed (Char* chPtr = query) { while (index >= 0) { Char oldChar = chPtr[index]; Char newChar; if (_dictionary.TryGetValue(oldChar, out newChar)) chPtr[index] = newChar; index--; } } } private void FixInternedString(ref String query) { StringBuilder sb = new StringBuilder(query.Length); foreach (Char c in query) { Char fixedChar; if (_dictionary.TryGetValue(c, out fixedChar)) sb.Append(fixedChar); else sb.Append(c); } query = sb.ToString(); } }

Using:

 static void Main(string[] args) { String eng = "qwertyuiop[]asdfghjkl;'zxcvbnm,.QWERTYUIOP{}ASDFGHJKL:\"ZXCVBNM<>`~ёЁ"; String rus = "йцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮеЕеЕ"; Replacer replacer = new Replacer(eng, rus); for (int i = 0; i < 10; i++) { String query = $"Hello World {i}"; replacer.FixCharacters(ref query); Console.WriteLine(query); // "Руддщ Цщкдв" } }

Answer 2 · 2017-08-10T12:01:38

one). You can create an "EN-> RU" dictionary from the eng and ru strings, then follow the specified string and replace each character in accordance with the dictionary. In this case, you will not have to repeatedly look through the original line, but only one new line will be created:

 public static class LangConversion { private static readonly Dictionary<char, char> engToRu = new Dictionary<char, char>(); static LangConversion() { var eng = "qwertyuiop[]asdfghjkl;'zxcvbnm,.QWERTYUIOP{}ASDFGHJKL:\"ZXCVBNM<>`~ёЁ"; var ru = "йцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮеЕеЕ"; for (var i = 0; i < eng.Length; i++) engToRu[eng[i]] = ru[i]; } public static string Fix(string str) { var sb = new StringBuilder(str.Length); foreach (char c in str) { char fixedChar; sb.Append(engToRu.TryGetValue(c, out fixedChar) ? fixedChar : c); } return sb.ToString(); } }

2). If performance is more important than the elegance of the code, then a replacement can be implemented using a switch-case:

 public static class LangConversion2 { public static string Fix(string str) { var sb = new StringBuilder(str.Length); foreach (char c in str) { sb.Append(Replace(c)); } return sb.ToString(); } private static char Replace(char c) { switch (c) { case 'q': return 'й'; ... case 'Ё': return 'Е'; default: return c; } } }

Full switch contents in fiddle .

3). Another way to perform a replacement with a small additional memory cost is using an array:

 public static class LangConversion3 { private static readonly char[] engToRu; static LangConversion3() { var eng = "qwertyuiop[]asdfghjkl;'zxcvbnm,.QWERTYUIOP{}ASDFGHJKL:\"ZXCVBNM<>`~ёЁ"; var ru = "йцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮеЕеЕ"; int maxCharCode = 0; foreach (char c in eng) maxCharCode = c > maxCharCode ? c : maxCharCode; engToRu = new char[maxCharCode + 1]; for (var i = 0; i < eng.Length; i++) engToRu[eng[i]] = ru[i]; } public static string Fix(string str) { var sb = new StringBuilder(str.Length); foreach (char c in str) { sb.Append(Replace(c)); } return sb.ToString(); } private static char Replace(char c) { if (c >= engToRu.Length) return c; var fixedChar = engToRu[c]; return fixedChar != 0 ? fixedChar : c; } }

Measurements of the speed of work on a line of 100 thousand characters with 1000 iterations:

 Исходный вариант: 6480мс Dictionary: 2550мс Pointers (ответ @LunarWhisper): 1560мс Switch-case: 1520мс Array: 1310мс Pointers + array: 720мс Pointers + switch-case: 580мс

@DmitryChistik with a dictionary, at least, comes out faster, although in my version the performance increase was not very big.
Thanks for the option, compared with the answer Lunar Whisper, there is a quarter faster!
I think after that your version will approach in speed to mine.
@LunarWhisper is interesting, by the way, the results of your version, overclocked using a switch-case, look like.

Answer 3 · 2017-08-10T15:24:26

Is not to create it

The lack of clear criteria for beauty "in appearance" usually leads to a correction in the direction of a "beautiful" complex code. Separate classes of replayers, native work with Char (which is unknown how it works on surrogate pairs), string builders, that’s all.

The lack of clear criteria for beauty "in speed" usually leads to "optimization" where it is not needed. You have a code for fixing "randomly typed characters in another language." Let's test it on strings with a length of 100,000 characters per 1000 iterations! Users write War and Peace every day, forgetting to switch the layout. And then they are upset that the code works as much as 6 milliseconds to correct this error. 0.5 ms for them - a tangible difference!

In your case:

There are no strict (really reasonable, not "it would be cool") requirements for performance, no clear beauty requirements (imho, the code that solves this problem in more than 10 lines is terrible!)
There are no memory requirements (your version, for example, spends (размер строки)^2 bytes, and the garbage will be thrown away immediately, and then gen 0 will not go away).

The only remaining requirement is readability . She is maintainability.

Readability can be measured by standard metrics. The same studio can count Code Metrics.

So, provided that the eng / ru is declared as static, the Maintainability Index benefits from two options:

 for (int i = 0; i < eng.Length; ++i) query = query.Replace(eng[i], ru[i]);

and a little more new, but the same in essence

 // считаем один раз, оптимизация! static (char, char)[] dictionary = eng.Zip(ru, (a, b) => (a, b)).ToArray(); //.... foreach (var (e, r) in dictionary) query = query.Replace(e, r);

It is with inline:

 foreach (var (e, r) in eng.Zip(ru, (a, b) => (a, b))) query = query.Replace(e, r);

In the limits you set, all other options are self-indulgence.

Unfortunately, Char in .NET does not support surrogate pairs.
You can "_\ud800\udc00_" this by a simple iteration of the line "_\ud800\udc00_" , so the result for both cases will be identical - it will not work in any way, as long as we work at the character level, and have not dropped to bytes.
Essentially true, but I would argue about the readability of the last solution.
@LunarWhisper and what is the problem with the readability of the last (penultimate) solution?
There are two lines of everything, not esoteric, not some kind of code golf - the usual cycle with replace.
It’s just that some people couldn’t accept linq with all of their heart (and even foreach), do not trust the compiler to infer types, and prefer "manual mode".
For example, any modern js-nickname or pythonmer reads this easily without even knowing C #.
But again - the proposed replacement option with iteration over the line N times, where N is the size of the dictionary ... and why?
Then the same code will be copied to another method, which will deal with the replacement of tags in a 500MB XML file and hello.
O. When it comes to linear dependence - O (N) or O (3N) - you can choose any option.
@LunarWhisper Then, I spent a dozen seconds writing this method.
This is a simple, readable variant that solves the task “Random set in the wrong language, autocorrection” set by the topicaster.
Why break into 20+ lines with unsafe, if the same can be solved manually.
If you copy it into replacing 500 Mb XML tags, you will immediately realize that you did something wrong.
It is possible to parry even into O (n ^ 6), if the real time of work at the maximum N is milliseconds.
You can not write code based on "and if" and inventing conditions on the go.
For many years I have been working on C # XML of 500 Mb exactly 0 times.
It is unreasonable to write every line of code with the expectation of all difficult situations that I have never met live :)

Random set in the wrong language, auto fix

3 answers 3

More articles: