Namely, I use ofstream and in the file itself it is recorded in Russian beeches well, but with the name of the problem document. What just did not try, the file name turns out something like this .txt (неверная кодировка)


Maybe I did not quite correctly ask the question, I need the program to save the txt file with the names аа.txt аб.txt ав.txt - ... - яя.txt

I will briefly describe the essence of the program: There is a dictionary of words, arranged in alphabetical order, the program must break this dictionary into 33 * 33 documents (Minus 3 * 32 , there are no words beginning with ь, ь, ъ). The document аб.txt will have all the words beginning with ab ...

The record in the files is correct, the only thing is that the file names are incorrect

Here is my main() function:

 int main() { if (rfile.is_open()) { while (getline(rfile, line)) { string first, second; try { first = line[0]; second = line[1]; }catch (...) { continue; } if (first == " " || first == "." || first == "-" || first == "|" || first == "," || second == " " || second == "|" || second == "-" || second == "." || second == ",") continue; string wfilename = first + second + ".txt"; ofstream wfile; wfile.open(wdirectory + wfilename, ios_base::app); if (wfile.is_open()) wfile << line; wfile.close(); } rfile.close(); } else cout << "Unable to open file" << endl; return 0; } 

Linux mint x64


Solution found! I took the first two letters of the line and used them to compose the name. For everything to work correctly, it was necessary to convert the source file to utf-8 . On linux this command looks like iconv -f windows-1251 < /home/user/filename.txt > /home/user/newEncodedFilename.txt Signs < and > required. Next, use the new file. Thank you sercxjo

  • Show how you write Russian letters to the file. - Vladimir Gamalyan
  • What is the OS. Windows has two modes: ANSI and OEM. Maybe you have an OEM encoding? Copy how exactly your file name looks. Look at how the characters 128, 140, 160, 200, 220 look like - then it will be clear what encoding means the file name. Try byte "score" UTF-8 name "\ xD0 \ xA4 \ xD0 \ xB0 \ xD0 \ xB9 \ xD0 \ xBB.txt" If the letters are Russian, then you need to think about translating into UTF-8, if not - you have another encoding. - nick_n_a
  • @nick and how to look? In debager? - Herrgott
  • Will not work in debager. wfile.open ("\ x80.txt", ios_base: app) this is how it will turn out. When you understand what encoding is then easier. Although it may be in debug create files with different names so as not to rebuild the project. - nick_n_a
  • @nick tried, it turns out .txt (неверная кодировка) - Herrgott

2 answers 2

Russian letters in utf-8 usually take 2 bytes. Choosing the first two bytes from the string you most likely get only one Russian letter, but it is possible if the first one is a different character or the length of the utf-8-representation of a character is more than 2 bytes, the utf-8 encoding sequence is incorrectly interrupted. For the first byte of the sequence (x & 192) == 192, for the rest (x & 192) == 128. By these signs, you can cut the first and second letters (more precisely, the first byte indicates the length of the sequence in the unit system of calculation, but we hope the correctness of the source data). Thus, the function will help to find the symbol length

 int wlen(const string &x, int start) { if(x[start]==0) return 0; if((x[start]&192)!=192) return 1; int i=1; while((x[start+i]&192)==128) i++; return i; } 

It now remains to replace the receipt of the first and second characters of the line:

 first = line.substr(0, wlen(line, 0)); second = line.substr(first.size(), wlen(line, first.size())); 

Well, then you can add an analysis that first.size()==0 || second.size()==0 first.size()==0 || second.size()==0

From the conversation in the chat it turned out that the source file is in windows-1251 encoding. To bring it to the encoding adopted in Linux Mint, you can use the iconv command:

 iconv -f windows-1251 < исходный_файл > новый_файл 

If you want to include transcoding in the program itself, you can use the libiconv example .

  • all the same in debugger first = <incomplete sequence \340> second same wfilename = "\340\340.txt" - Herrgott
  • and what is the source line? - sercxjo
  • абдикация [отречение; отказ от сана и власти правителя, сложение с себя этого звания (Даль)] см. отказ абдикация [отречение; отказ от сана и власти правителя, сложение с себя этого звания (Даль)] см. отказ dismissal of абдикация [отречение; отказ от сана и власти правителя, сложение с себя этого звания (Даль)] см. отказ - Herrgott
  • Maybe you are not in utf-8 encoding? How does the line look in the same place in the debager? - sercxjo
  • "\347\340\362\377\355\363\362\356\351|\347\340\342\377\347\340\355\355\356\351\r" - Herrgott

Check your system locale and what exactly is in the files. For everything works great:

1 :

 $ locale LANG=ru_RU.UTF-8 LANGUAGE= LC_CTYPE="ru_RU.UTF-8" LC_NUMERIC="ru_RU.UTF-8" LC_TIME="ru_RU.UTF-8" LC_COLLATE="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES="ru_RU.UTF-8" LC_PAPER="ru_RU.UTF-8" LC_NAME="ru_RU.UTF-8" LC_ADDRESS="ru_RU.UTF-8" LC_TELEPHONE="ru_RU.UTF-8" LC_MEASUREMENT="ru_RU.UTF-8" LC_IDENTIFICATION="ru_RU.UTF-8" LC_ALL= $ ls ф* ффф.текст $ cat ффф.текст ффф.результат $ ./a.out ффф.результат $ ls ф* ффф.результат ффф.текст 

2 :

 #include <fstream> #include <string> #include <iostream> int main() { std::ifstream rfile; rfile.open( "ффф.текст" ); std::string line; getline(rfile, line); rfile.close(); std::cout << line << std::endl; std::ofstream wfile; wfile.open( line.c_str() ); wfile.close(); } 
  • reads, Russian also reads, oddly enough. Tried yesterday a simple cout << "Лол"; brought out the grubs - Herrgott
  • There may be two options: either the system locale is not UTF-8, or the source code with "Lol" in a different encoding. PS What he "reads" in the example is not so important, it is important what file is being created. - PinkTux
  • So I did not understand what to do. How to save a file with a Russian name and content? - Herrgott
  • 2
    To begin with - read my answer, do step by step what it contains. Starting from the first command (check your locale). Then - create a file ффф.текст , write in it a line ффф.результат . ффф.результат . Build and run the source. See what he brings. Then look at the result of the command ls ф* in the launch directory. Report. (nothing that I just rewrote my answer again? I won't do this the third time, sorry ...) - PinkTux
  • Yes, no matter what to collect. What happens next? Although the gcc command line and output can also be shown. - PinkTux