Free translation of the question: What's the difference between utf8_general_ci and utf8_unicode_ci .
Both of these encodings ( utf8_general_ci and utf8_unicode_ci ) work with UTF-8 characters, the difference is in sorting the strings and comparing them.
Note: since MySQL version 5.5.3, it is preferable to use utf8mb4 , rather than utf8 . Both are UTF-8 encoded, but the older uft8 has uft8 -specific UTF-8 character restrictions above 0xFFFD.
Comparison of individual parameters.
Accuracy
utf8mb4_unicode_ci based on the Unicode standard for sorting and string comparison, which more accurately sorts strings in a wide range of languages / alphabets.
utf8mb4_general_ci does not implement all Unicode sorting rules, what
often entails an undesirable result in some situations for
certain languages / characters.
Performance
utf8mb4_general_ci faster in comparing and sorting because it contains a large number of optimizations.
On modern servers, this increment of speed will always be, but only slightly. Optimizations were conceived at a time when server capacity was significantly less than today.
utf8mb4_unicode_ci , which uses Unicode rules for sorting and comparing, honestly uses more sophisticated algorithms for precise sorting for a wide number of languages and using special characters. These rules take into account the specific conventions for the language, not always sorting goes in accordance with the "alphabetical" order.
In principle, for a group of so-called. "European" languages do not make much difference between strict Unicode sorting and utf8mb4_general_ci simplified sorting, but a few differences:
For example, Unicode sorts "ß" as well as "ss", and "Œ" as "OE" as people do, while utf8mb4_general_ci sorts them as separate characters (presumably as "s" and "e" respectively).
Some Unicode characters are defined as insignificant, which means that they should not affect the sort order and the comparison should proceed to the next character. And utf8mb4_unicode_ci handles these characters correctly.
For a group of non-European languages, such as Asian languages or languages with a different alphabet, there are many more differences between sorting Unicode and simplified sorting in utf8mb4_general_ci . The way utf8mb4_general_ci is utf8mb4_general_ci will depend on the particular language. For some languages, the difference may be very insufficient.
What to use?
It makes little sense to prefer utf8mb4_general_ci for performance reasons, because on modern processors, the difference will not play the role of a bottleneck.
There may be some kind of performance difference in some highly specialized situations and if this is your case you should be aware of this.
Previously, some experts recommended using utf8mb4_general_ci except when precise sorting is necessary and more important than sinking performance. Today, more attention is paid to the precise support of internationalization than to a slight slump in productivity.
And I’ll add that even if your application should support only English, it can be a situation where the application will enter the names of people and often entered names should contain characters that are found in other languages, so it is important to use correct sorting rules . Using Unicode in all places where it is possible will help you develop better applications.
utf8mb4_unicode_ci. You must also have a connection inutf8mb4mode with theutf8mb4. - Visman