Undefined behavior of the strlen function (incorrectly counted)

Question

I'm trying to create my own string class. It turned out that the strlen function is somehow not working as expected. What am I doing wrong?

#include <iostream> #include <cstring> #include <cstdlib> using namespace std; class my_str { private: char* s; public: my_str(); my_str(const char *str); my_str(const my_str &ob); ~my_str(); my_str &operator=(const my_str &ob); void my_strlen(); void print_str(); }; my_str::my_str() { s = new char [1]; strcpy (s, ""); cout << "конструктор" << endl; } my_str::my_str(const char *str) { cout << "конструктор парметризованный1" << endl; s = new char [strlen(str)+1]; strcpy (s, str); cout << strlen(s); cout << "конструктор парметризованный2" << endl; } my_str::my_str(const my_str &ob) { s = new char [strlen(ob.s)+1]; strcpy (s, ob.s); cout << "конструктор копии" << endl; } my_str::~my_str() { if(s) delete [] s; cout << "деструктор" << endl; } my_str &my_str::operator=(const my_str &ob) { cout << "=" << endl; cout << ob.s << endl; cout << strlen(ob.s) << endl; delete [] s; s = new char[strlen(ob.s)+1]; strcpy(s, ob.s); return *this; } void my_str::my_strlen() { cout << strlen(s) << endl; } void my_str::print_str() { if(s) { for(int i = 0; s[i]; i++) { cout << s[i] << " "; } } cout << "_" << endl; } int main() { my_str a("Привет "), b("всем!"), c; cout <<strlen("Привет ")<< endl; a.my_strlen(); cout <<strlen("всем!")<<endl; b.my_strlen(); c.my_strlen(); c=a; c.my_strlen(); c.print_str(); a=b; a.my_strlen(); a.print_str(); return 0; }

After starting the program, I get (g ++ compiler):

 конструктор парметризованный1 13конструктор парметризованный2 конструктор парметризованный1 9конструктор парметризованный2 конструктор 13 13 9 9 0 = Привет 13 13 ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ _ = всем! 9 9 ▒ ▒ ▒ ▒ ▒ ▒ ▒ ▒ ! _ деструктор деструктор деструктор

Why are 13 and 9 when should be 7 and 5 ?? And where do these strange characters come from when you print character by character, not a string?

The encoding was first replaced with ASCII, then with Windows-1251. strlen became true to count. However, with Russian letters it became even worse:

 ▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 1 7▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 2 ▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 1 5▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 2 ▒▒▒▒▒▒▒▒▒▒▒ 7 7 5 5 0 = ▒▒▒▒▒▒ 7 7 ▒ ▒ ▒ ▒ ▒ ▒ _ = ▒▒▒▒! 5 5 ▒ ▒ ▒ ▒ ! _ ▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒

Now everything is in hieroglyphs. Return to previous encodings is no longer working ...

I think that you have a source in UTF-8, and Russian letters have more than one byte ... Here from 7 characters - 6 Russian + ASCII you get 6 * 2 + 1 = 13, from 4 + 1 - 9 ...
Привет in UTF-8: D0 9F D1 80 D0 B8 D0 B2 D0 B5 D1 82 20, and what is your OS and what editor?
To begin, set up an experiment: use only characters from the lower half of ASCII: a space, numbers, punctuation marks, English large and small letters.
This means the encoding of the editor is different from the single letter.
At the command prompt, run chcp 1251 , then the output of your program will be correct.

Undefined behavior of the strlen function (incorrectly counted)

0

More articles: