I had a code that did reverse each line of the file in a single-byte encoding. Now it took to do this for UTF-8. Rewrote under wide char (code below). It also reverses strings, but some characters are lost. I understand that the problem is with the fseek functions at the end of the loop body. But I can not solve it myself.

#define NULL_TERMINATOR '\0' #define NEW_LINE '\n' int main() { size_t i = 0; file = _wfopen(L"rev.txt", L"r+, ccs=UTF-8"); wchar_t b[4096]{ NULL_TERMINATOR }; while ((fgetws(b, sizeof(b), file)) != NULL) { if (b[wcslen(b) - 1] == NEW_LINE) { b[wcslen(b) - 1] = NULL_TERMINATOR; wcsrev(b); b[wcslen(b)] = NEW_LINE; } else wcsrev(b); fseek(file, i, SEEK_SET); int t=fwrite(b, wcslen(b)*2, count, file); i += wcslen(b)*2+3; fseek(file, i, SEEK_SET); } return 0; } 
  • 3
    @ Kirill21 Only here spaces, punctuation marks and Latin characters remain single-byte. without reading the first byte of the character it is impossible to predict how long it is. Generally utf-8 characters can be up to 5 bytes long - Mike
  • 3
  • 3
    Your plan is doomed to failure, because if the symbol consists of two code points (for example, a letter and a diactric), then during reversal, their order must be maintained. - Abyx
  • 2
    @alexolut And it seems to me that utf-8 does not have a limit on the size of the character at all. From Wikipedia: Алгоритм UTF-8 технически позволяет записывать код любой длины. Но для эффективной и надёжной работы алгоритма необходимо ограничение длины кода. Действующий стандарт Unicode 6.х предполагает использование кода до 21-го бита, то есть до четырех байт в UTF-8. Алгоритм UTF-8 технически позволяет записывать код любой длины. Но для эффективной и надёжной работы алгоритма необходимо ограничение длины кода. Действующий стандарт Unicode 6.х предполагает использование кода до 21-го бита, то есть до четырех байт в UTF-8. - Max ZS
  • 2
    @ Kirill21 By the way, Abyx is right, when you reverse, you may have other problems, such as combined characters. for example, the letter Y. stackoverflow.com/questions/481050/… And say this thanks for the Russian language, utf-8 allows you to assign up to 3 combinational characters to a character and almost all libraries consider them as separate characters. I don’t need to go far, javascript on the combined Q gives a length of 2 characters - Mike

0