Good evening. I just can not write to the database of messages / comments from VK, which contain smiles. I write through python + mysqldb + peewee. Changed MySQL encoding to utf8mb4

 SET NAMES utf8mb4; SET CHARACTER SET utf8mb4; SET character_set_connection=utf8mb4; ALTER DATABASE first_db CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE first_db.vkpost CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE first_db.vkcomment CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE first_db.vkpost CHANGE content content longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL; ALTER TABLE first_db.vkcomment CHANGE content content longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL; 

However, whatever I did, the output is always the same. (1366, u"Incorrect string value: '\\xF0\\x9F\\x98\\x8A \\xD0...' for column 'content' at row 1")

If you enter the data manually, then MySQL Workbench gives the same error. enter image description here

How can you beat it or even cut emoticons from the text? Cut smiles, probably, even better, so as not to store too much.

  • What kind of request write something? The correct coding in mysqldb exposed? You transfer exactly Unicode, but not a byte string? In short, the Python code in the studio - andreymal
  • And you do not need to cut smiles, there will be rays of hatred from users in the face of me) - andreymal
  • Python has nothing to do with it, tried without it and got the same error - Skotinin
  • one
    And in MySQL Workbench maybe just a bug, let's just the Python-code, my smileys were always perfectly recorded - andreymal
  • Forgot the encoding in the connection to specify db = MySQLDatabase('first_db', host='0.0.0.0', port=3306, user='user', password='1233', charset='utf8mb4') :) - Skotinin

2 answers 2

In the MySQL config (for example, in /etc/mysql/conf.d/unicode.cnf ) in the relevant section specify:

 [mysqld] character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci skip-character-set-client-handshake 

Restart MySQL and check:

 mysql --silent --raw <<DOC | column -t show variables like 'character\_set\_%'; show variables like 'collation%'; DOC 

It should output something like this:

 character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8mb4 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8mb4 character_set_system utf8 collation_connection utf8mb4_unicode_ci collation_database utf8mb4_unicode_ci collation_server utf8mb4_unicode_ci 

    Alternatively, when entering smiles into the database, you can set them with an internal code, and when outputting text from the database, run it through the inverse function. This way you can guarantee the preservation of data by smiles (which is especially important when transferring / importing the database, when due to incorrect coding some special characters can be cut out) and use smiles that are not included in standard Unicode.