C ++ curl get utf-8 in Cyrillic

Question

There is a document on the Internet in UTF-8

Trying to get it in UTF-8 with curl

static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) { ((std::wstring*)userp)->append((wchar_t*)contents, size * nmemb); return size * nmemb; } ... CURL *curl; CURLcode res; std::wstring readBuffer; curl = curl_easy_init(); if(curl) { curl_easy_setopt(curl, CURLOPT_URL, "url"); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback); curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer); res = curl_easy_perform(curl); curl_easy_cleanup(curl); } int pos = readBuffer.find_first_of(L"World");

breakpoint on readBuffer

and pos

Why don't I get what I need in UTF-8?

This is so Cyrillic in normal encoding ... Another question is that the text visualizer in the studio does not allow you to select the encoding of the displayed text.
The question should sound like this - how to make the studio guess the encoding and finally learn how to display utf-8
@nick_n_a, and what is the difference between char and wchar_t?

Answer 1 · 2018-11-01T14:38:47

Option for windows. Since I often had to deal with encodings, I suggest trying the “solution head-on”, i.e. I see that in the uncharode format wchar_t data is stored in UTF-8 format. You can convert them to char and back to unicode. Not the best solution (some characters, i.e. all that is up but not Cyrillic - will be lost).

 if(curl) { /*...*/ } // чисто показать после какого куска вписать код int utf8_len = readBuffer.length(); char * tmp = (char*)malloc(utf8_len+1);// просим место utf8_len = WideCharToMultiByte(CP_UTF8, 0,readBuffer.c_str(), utf8_len , tmp , utf8_len , 0, 0); tmp[utf8_len]=0; wchar_t * wtmp = (wchar_t*)malloc(utf8_len*2+2); wsprintfW(wtmp,L"%hs",tmp); // Всё, получили стандартный юникод readBuffer.assign(wtmp); // присвоить результат // ну и всё освободить free((void*)tmp); free((void*)wtmp);

Other solutions:

ADO has a converter
Search encoding conversion info
You can simply convert wchar to a char loop, and take the function MultiByteToWideChar instead of WideCharToMultiByte, but have not tried ... http://msdn.microsoft.com/en-us/magazine/mt763237.aspx
Perhaps this Need an example of using g_convert ()

Try this solution

 int utf8_len = readBuffer.length(); char * tmp = (char*)malloc(utf8_len+1);// просим место for (int i=0;i<utf8_len;i++) tmp[i]=(char)readBuffer[i]; wchar_t * wtmp = (wchar_t*)malloc(utf8_len*2+2); utf8_len = MultiByteToWideChar ( CP_UTF8, 0,tmp,utf8_len, wtmp , utf8_len ); wtmp[utf8_len]=0; readBuffer.assign(wtmp); // присвоить результат free((void*)tmp); free((void*)wtmp);

C ++ curl get utf-8 in Cyrillic

1 answer 1

More articles: