There is a document on the Internet in UTF-8

enter image description here

Trying to get it in UTF-8 with curl

static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) { ((std::wstring*)userp)->append((wchar_t*)contents, size * nmemb); return size * nmemb; } ... CURL *curl; CURLcode res; std::wstring readBuffer; curl = curl_easy_init(); if(curl) { curl_easy_setopt(curl, CURLOPT_URL, "url"); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback); curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer); res = curl_easy_perform(curl); curl_easy_cleanup(curl); } int pos = readBuffer.find_first_of(L"World"); 

breakpoint on readBuffer

enter image description here

and pos

enter image description here

Why don't I get what I need in UTF-8?

  • five
    This is so Cyrillic in normal encoding ... Another question is that the text visualizer in the studio does not allow you to select the encoding of the displayed text. - VTT
  • 3
    The question should sound like this - how to make the studio guess the encoding and finally learn how to display utf-8 - KoVadim
  • Recode to wchar_t* and will be happy. ru.stackoverflow.com/questions/839080/… - nick_n_a
  • @VTT, I get it, I didn't know - Mike Waters
  • @nick_n_a, and what is the difference between char and wchar_t? std :: string and std :: wstring?) - Mike Waters

1 answer 1

Option for windows. Since I often had to deal with encodings, I suggest trying the “solution head-on”, i.e. I see that in the uncharode format wchar_t data is stored in UTF-8 format. You can convert them to char and back to unicode. Not the best solution (some characters, i.e. all that is up but not Cyrillic - will be lost).

 if(curl) { /*...*/ } // чисто показать после какого куска вписать код int utf8_len = readBuffer.length(); char * tmp = (char*)malloc(utf8_len+1);// просим место utf8_len = WideCharToMultiByte(CP_UTF8, 0,readBuffer.c_str(), utf8_len , tmp , utf8_len , 0, 0); tmp[utf8_len]=0; wchar_t * wtmp = (wchar_t*)malloc(utf8_len*2+2); wsprintfW(wtmp,L"%hs",tmp); // Всё, получили стандартный юникод readBuffer.assign(wtmp); // присвоить результат // ну и всё освободить free((void*)tmp); free((void*)wtmp); 

Other solutions:

Try this solution

 int utf8_len = readBuffer.length(); char * tmp = (char*)malloc(utf8_len+1);// просим место for (int i=0;i<utf8_len;i++) tmp[i]=(char)readBuffer[i]; wchar_t * wtmp = (wchar_t*)malloc(utf8_len*2+2); utf8_len = MultiByteToWideChar ( CP_UTF8, 0,tmp,utf8_len, wtmp , utf8_len ); wtmp[utf8_len]=0; readBuffer.assign(wtmp); // присвоить результат free((void*)tmp); free((void*)wtmp);