Code optimization

Question

int main () { char psz[256]; cout<<"Vvedite predlozheniya\n"; fgets(psz, sizeof(psz), stdin); int nwords; cout<<"Vvedite kol-vo slov\n"; cin>>nwords; printf("string = %s\n\n",psz); for (int i = 0; psz[i] != '\0'; i++) { char pchars[] = "!?."; int k = i, cnt = 0; while (!strchr(pchars,psz[i]) && psz[i] != '\0') if (psz[i++] == ' ') cnt++; int n = k; if (k == 0) cnt++; char temp[256] = "\0"; int s = 0; while (n <= i && cnt == nwords) temp[s++] = psz[n++]; if (cnt == nwords) printf("%s\n",temp); } system("pause"); return 0; }

Greetings. Is it possible to somehow simplify or maybe even implement in another way? Here sentences and numbers are entered, then only those sentences in which the number of words is equal to the entered number are displayed on the screen.

@RubyNut, a few questions, by answering to which you correct the program yourself: 1) what will happen if the words are separated not by one, but by several spaces?
4) why should it be copied at all if the indices of its beginning and end are known, and there is also a function putchar () (or cout << in C ++)?
@avp, thank you, very useful list of comments, now I will try to fix it

Accepted Answer · 2012-12-17T20:47:07

Option number 1 (with strtok_r ()). Proposed @LoOnly. Compact and logical (in my opinion) program. Linux, compiled g ++ and gcc.

In the future, I plan to show two more options and compare their performance.

 // tstrtok.c печать предложений с заданным (av[1]) количеством слов (av[1]==0 все) #include <stdio.h> #include <stdlib.h> #include <string.h> #define delims " \t\n.<,>?/~`!@$%^&*()-+=[]{}\\|;:'\"" #define sdelims ".?!;" char * trim (char *str) { char *s = str; while (*s && strchr(delims,*s)) s++; s = strdup(s); char *t = s+strlen(s)-1; while (t > s && strchr(delims,*t)) t--; *++t = 0; return s; } int main (int ac, char *av[]) { char text[1024], *sentence, *snt_copy, *psntc; int desiredWordsNumber = av[1]? atoi(av[1]):0, wordcnt; while (puts("Enter text"),fgets(text,1024,stdin)) { sentence = strtok_r(text,sdelims,&psntc); if (sentence) do { wordcnt = 0; snt_copy = strdup(sentence); char *pword, *word = strtok_r(snt_copy,delims,&pword); if (word) do { wordcnt++; } while(word = strtok_r(NULL,delims,&pword)); free(snt_copy); if (wordcnt && (!desiredWordsNumber || wordcnt == desiredWordsNumber)) { printf ("[%s]\n",snt_copy = trim(sentence)); free(snt_copy); } } while (sentence = strtok_r(NULL,sdelims,&psntc)); } return 0; }

We compile and run (first print all sentences, then only from 2 words)

 avp@avp-ubu1:~/hashcode$ g++ -O3 tstrtok.c -o tstrtok avp@avp-ubu1:~/hashcode$ ./tstrtok Enter text мама мыла раму, кот жрал мясо. папа ржал [мама мыла раму, кот жрал мясо] [папа ржал] Enter text avp@avp-ubu1:~/hashcode$ ./tstrtok 2 Enter text test, test, test; test, test??? yes test !!!! simple short test. [test, test] [yes test] Enter text avp@avp-ubu1:~/hashcode$

I think, in principle, everything is clear. Continuation (along with time measurements) follows ...

Promised continuation

I made another option (nword.c), from which I expected more speed, but it was 4 times slower than the above program with strtok_r ().

The idea, which at first seemed very sound, was to search for the beginning and end of words (by delimiters) and the analysis of delimiters, completing the word for the presence of the character of the end of a sentence. In this case, the index of the first word of the sentence was memorized, and when the end of sentence symbol was found, all of it was printed without intermediate copying.

Actually, the main loop of the program implementing this algorithm is also small. Here it is, with additions to the output in / dev / null when measuring runtime in a loop

  // собственно цикл подсчета слов в каждом предложениии и печать wordcnt = curpos = 0; while (getWordBeg(str,curpos,&start)) { if (!wordcnt) sentenceBegin = start; curpos = getWordEnd(str,start); wordcnt++; if (isSentence(str,&curpos)) { if (desiredWordsNumber < 1 || wordcnt == desiredWordsNumber) { int i = sentenceBegin, end = ctrim(str,i,curpos); fputc('[',out); while (i < end && str[i] && str[i] != '\n') fputc(str[i++],out); fputs("]\n",out); } wordcnt = 0; } }

16 lines, and in the program with strtok_r () 15 lines. The functions getWordBeg , getWordEnd , isSentence and ctrim obvious and use standard strchr , strspn and strcspn to search for delimiters.

Comparison of the speed of work showed a completely unexpected result for me .

 avp@avp-ubu1:~/hashcode$ ./tstrtok Enter text и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? [и пишу я тесты] [разные тесты] [тесты, тесты, тесты] [что толку от них] Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 497 msec Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 592 msec Enter text avp@avp-ubu1:~/hashcode$ gcc -O3 nwords.c avp@avp-ubu1:~/hashcode$ ./a.out Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 2238 msec Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 2243 msec

The program calls strtok and, logically, copying all the text into the internal buffer, was 4 times faster! .

An attempt to optimize nword by replacing the standard strchr , strspn and strcspn with options for finding delimiters in hash tables turned out to be quite boring.

 avp@avp-ubu1:~/hashcode$ gcc -O3 -DUSE_HASH nwords.c avp@avp-ubu1:~/hashcode$ ./a.out Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 4369 msec

It turned out that the implementation of defina-mi strchr , strspn and strcspn in /usr/include/bits/string2.h is very nontrivial and apparently really effective, and it’s possible to deal with hashing more closely.

The strtok_r gorgeous result, as it turned out due to gdb , is overshadowed by the fact that it modifies the source string , writing zeros in place of delimiters. Those. no copying takes place, but it is described at the end of the manpage (interestingly, I’m one so inconsiderate that I first take on the debugger, and then carefully read the man ???)

These are the words of their man 3 strtok

 BUGS Be cautious when using these functions. If you do use them, note that: * These functions modify their first argument. * These functions cannot be used on constant strings. * The identity of the delimiting character is lost.

I hope someone this small study is useful.

a continuation

@RubyNub , more about trim()

The fact is that strtok_r() in a loop by sentences can leave after the last word in a sentence word delimiters that are not sentence delimiters. In general, trim is made for "similarity" to the answer @gkuznets .

So, finding out the nature of the error that occurs when measuring time in the program with strtok_r ( strtok_r , this function writes zeros to the parsed text and does not restore it) had to remember all the entered text and restore it before each cycle. The program has acquired the following form, and the time of its work has increased to 1700-1800 milliseconds (a million cycles).

 // печать предложений с заданным (av[1]) количеством слов (av[1]==NULL все) #include <stdio.h> #include <stdlib.h> #include <string.h> #define delims " \t\n.<,>?/~`!@$%^&*()-+=[]{}\\|;:'\"" #define sdelims ".?!;" // копия предложения с обрезанием разделителей в начале и конце char * trim (char *str) { char *s = str, *t; while (*s && strchr(delims,*s)) // найдем начало текста (не разделитель) s++; /* копируем strlen, malloc, memcpy сделано вместо strdup, strlen чтобы убрать лишний проход по s (два в strdup + еще один в strlen) */ int l = strlen(s)+1; t = (char *)malloc(l); memcpy(t,s,l); s = t; t += (l-1); // позиция перед завершающим 0 while (t > s && strchr(delims,*t)) // пробегаем разделители в конце t--; *++t = 0; // "удаляем" их return s; } long long mutime(void); int main (int ac, char *av[]) { char text[1024], copy[sizeof(text)], *sentence, *snt_copy, *psntc; int desiredWordsNumber = av[1]? atoi(av[1]):0, wordcnt; // для замера времени FILE *out = stdout, *nullout = fopen("/dev/null","w"); int nloops, i; volatile int l = 1; long long tstart, dt, ct; while (puts("Enter text"),fgets(text,1024,stdin)) { // для замера времени if ((nloops = atoi(text)) < 1) nloops = 1; if (nloops > 1) { out = nullout; tstart = mutime(); l = strlen(text); for (i = 0; i < nloops; i++) memcpy(copy,text,l); ct = mutime()-tstart; printf ("copytime %d bytes %d loops %lld %s\n", l, nloops, ct < 10000? :ct/1000, ct < 10000? "usec" : "msec"); tstart = mutime(); } for (i = 0; i < nloops; i++, memcpy(text,copy,l)) { // собственно цикл подсчета слов в каждом предложениии и печать sentence = strtok_r(text,sdelims,&psntc); if (sentence) do { wordcnt = 0; snt_copy = strdup(sentence); char *pword, *word = strtok_r(snt_copy,delims,&pword); if (word) do { wordcnt++; } while(word = strtok_r(NULL,delims,&pword)); free(snt_copy); if (wordcnt && (!desiredWordsNumber || wordcnt == desiredWordsNumber)) { fprintf (out,"[%s]\n", snt_copy = trim(sentence)); free(snt_copy); } } while (sentence = strtok_r(NULL,sdelims,&psntc)); } // end for nloops, print time if (out == nullout) { out = stdout; dt = mutime()-tstart - ct; printf ("%d loops %lld %s\n", nloops, dt < 10000? :dt/1000, dt < 10000? "usec" : "msec"); } } return 0; } #include <sys/time.h> long long mutime() { struct timeval t; gettimeofday(&t, NULL); long long mt = (long long)t.tv_sec * 1000000 + t.tv_usec; return mt; }

Remembering that the author's question was about optimization (in terms of execution time?) Of the task made another version. The source data and the type of result are the same as in nwords and tstrtok , and the algorithm is slightly different, based on searching for indices (positions in the text) of the beginning and end of the word, plus optimized whether the next character is a separator. To do this, we use arrays of 256 characters and the value of the character being checked as an index in such an array.

Here is the text of the fastest option.

 // ncwords.c считает количество слов в предложении #include <stdio.h> #include <stdlib.h> #include <string.h> // все разделители #define delims " \t\n.<,>?/~`!@$%^&*()-+=[]{}\\|;:'\"" // разделители предложений (подмножество delims) #define sdelims ".?!;" /** * makeset сделать множество однобайтных символов * @set : [out] результат: массив, индексируемый символами * @str : [in] строка, символы которой помещаем в set */ static void makeset (char set[256], const char *str) { memset(set,0,256); while (*str) set[*(u_char *)str] = *str++; } /** * isSentence ищет в text символ конца предложения среди разделителей * text [in] текст со словами * wordDelimiters [in] множество разделителей слов * sentenceDelimiters [in] множество разделителей предложений * (должно быть подмножеством wordDelimiters) * endWord [in] позиция первого разделителя после слова в text * pos [out] указатель на int, куда заносится позиция * первого символа из sentenceDelimiters среди * последовательности символов из wordDelimiters * или первого символа, не входящего в wordDelimiters * * возвращает 1, если конец предложения найден и 0 если не найден */ static int isSentence (char *text, char wordDelimiters[256], char sentenceDelimiters[256], int endWord, int *pos) { do { if (sentenceDelimiters[(u_char)text[endWord]] || !text[endWord]) { *pos = endWord; return 1; } endWord++; } while (wordDelimiters[(u_char)text[endWord]] || !text[endWord]); *pos = endWord; return 0; } /** * getWord ищет позиции начала и конца слова в text * text [in] текст со словами * wordDelimiters [in] множество разделителей слов * start [in,out] позиция в text, начиная с которой ищем слово * при возврате это позиция начала слова или конца text * end [out] сюда заносим позицию первого разделителя после слова * или позицию конца text * * возвращает 1, если слово найдено или 0 если слов больше нет */ int getWord (char *text, char wordDelimiters[256], int *start, int *end) { int i = *start; while (wordDelimiters[(u_char)text[i]]) i++; *start = i; if (!text[i]) { *end = i; return 0; } while (text[i] && !wordDelimiters[(u_char)text[i]]) i++; *end = i; return 1; } long long mutime(void); // текущее время в микросекундах (realtime) /* * ./a.out [desiredWordsNumber] вывод в stdout предложений из строки stdin, * содержащих desiredWordsNumber слов. * При вызове без аргументов выводит в stdout все предложения. * Если строка текста начинается с ЧИСЛА, то предложения * в stdout не выводятся, а производится измерение времени исполнения * ЧИСЛА циклов программы. * * основной алгоритм * В цикле из введенного в строке текста выбираем очередное слово, * считаем слова в предложении. * Начало первого слова в предложении запоминаем, * в качестве начала предложения. * После каждого слова проверяем наличие одного из символов конца * предложения. Если предложение содержит desiredWordsNumber слов, * то выводим его в stdout и сбрасываем счетчик слов в предложении. */ int main (int ac, char *av[]) { char *text = NULL, wordDelimiters[256], sentenceDelimiters[256]; size_t textSize; int wordStart, endWord, curpos, wordCnt, sentenceBegin, desiredWordsNumber = av[1]? atoi(av[1]):0; makeset(wordDelimiters,delims); makeset(sentenceDelimiters,sdelims); // для замера времени FILE *out = stdout, *nullout = fopen("/dev/null","w"); int nloops, i; long long tstart; while (puts("Enter text"),getline(&text,&textSize,stdin) > 0) { // для замера времени if ((nloops = atoi(text)) < 1) nloops = 1; if (nloops > 1) { out = nullout; tstart = mutime(); } for (i = 0; i < nloops; i++) { // собственно цикл подсчета слов в каждом предложениии и печать wordCnt = wordStart = 0; while (getWord(text,wordDelimiters,&wordStart,&endWord)) { if (!wordCnt) sentenceBegin = wordStart; wordCnt++; if (isSentence(text, wordDelimiters, sentenceDelimiters, endWord, &curpos)) { if (desiredWordsNumber < 1 || wordCnt == desiredWordsNumber) { int i; for (i = sentenceBegin, fputc('[',out); i < endWord; i++) fputc(text[i],out); fputs("]\n",out); } wordCnt = 0; } wordStart = curpos; } } // конец цикла замера времени if (out == nullout) { out = stdout; long long dt = mutime()-tstart; printf ("%d loops %lld%s\n", nloops, dt < 10000? :dt/1000, dt < 10000? " usec" : " msec"); } } return 0; } #include <sys/time.h> long long mutime() { struct timeval t; gettimeofday(&t, NULL); long long mt = (long long)t.tv_sec * 1000000 + t.tv_usec; return mt; }

Now the results (time) of the three options.

 avp@avp-ubu1:~/hashcode$ g++ -O3 nwords.c -o nwords avp@avp-ubu1:~/hashcode$ ./nwords Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 2222 msec Enter text avp@avp-ubu1:~/hashcode$ g++ -O3 tstrtok.c -o tstrtok avp@avp-ubu1:~/hashcode$ ./tstrtok Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? copytime 129 bytes 1000000 loops 21 msec 1000000 loops 1795 msec Enter text avp@avp-ubu1:~/hashcode$ g++ -O3 ncwords.c -o ncwords avp@avp-ubu1:~/hashcode$ ./ncwords Enter text 1000000 и пишу я тесты. разные тесты. тесты, тесты, тесты... что толку от них? 1000000 loops 1211 msec Enter text avp@avp-ubu1:~/hashcode$

It would be interesting to compare with the boost option proposed by @gkuznets . Naturally, the option should not leave behind a modified initial text.

None of the fans of C ++ do not want? And then I have something went wrong with a boost.

@RubyNub, from man strtok : char * strtok_r (char * str, const char * delim, char ** saveptr);
... The strtok_r () function is a reentrant version strtok ().
.... Different strings may be parsed concurrently using sequences to specify different saveptr arguments.
The middle of the explanation (from man ) was thrown out, because
(continued) In short, strtok works like this: On the first call, it finds the first word delimited by delimiters, writes 0 to str after it, stores this address in its internal static char * variable, and returns the address of the beginning of the word.
In the second (etc.) call, instead of str, it is passed NULL, and to search for the next word, it uses the address stored in it.
And in strtok_r instead of the internal variable, we strtok_r ours into it.
When we call strtok_r and str NULL, the address from saveptr is taken to search for the next word.
But I don’t understand either the principle or the algorithm of his work.
@RubyNub, in C, C ++ everywhere, where you can write an expression, you can write a list of expressions (comma-separated).
The result of the list is the result of the last expression.
Accordingly, (puts("Enter text"),fgets(text,1024,stdin) means: call the puts() function and then the fgets() function and while will be executed until the result of fgets is zero (i.e., until the end of the input.) Just so compactly recorded the "invitation" with the subsequent input of the string.

Answer 2 · 2012-12-15T22:06:29

You can implement a different, more readable, way. Only most likely it is not what you need (-;

 #include <iostream> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/trim.hpp> using namespace boost; using namespace boost::algorithm; int wordsNumber(const std::string& sentence) { auto wordsIterator = make_split_iterator( sentence, token_finder(is_any_of(" "), token_compress_on)); return std::distance(wordsIterator, decltype(wordsIterator)()); } int main() { std::cout << "Enter sentences\n"; std::string input; std::getline(std::cin, input); std::cout << "Enter desired words number\n"; int desiredWordsNumber; std::cin >> desiredWordsNumber; std::vector<std::string> sentences; boost::split(sentences, input, is_any_of("!?.")); for (const auto& sentence: sentences) { std::string trimmedSentence = trim_copy(sentence); if (wordsNumber(trimmedSentence) == desiredWordsNumber) std::cout << trimmedSentence << "\n"; } }

And, of course, the code is intuitive, but are you sure that all the subtleties of calling non-standard functions are obvious without comments?
But for this for (const auto & sentence: sentences) {which compiler is needed?
I am sure that the subtleties of calling non-standard functions are not obvious without reading the documentation.
+1, but you have too formally approached the notion of "sentence."
I would, as an improvement in heuristics, require a space after the period, etc., and a capital letter after it.
Quotation marks, parentheses and direct speech further complicate the task.
It seems to me that the total number of letters is not so significant, the main thing is that there are fewer new entities.
Example: if I connect "mystringtools.h" , and from there size_t getWordCount(const std::string&) or size_t getWordCount(const char*) imported, the comprehensibility of the code is no worse, regardless of C ++ or C-style.
The function itself can be implemented in different ways, again, the main thing is to be understood.

Deleted 351 1 golden mark 5 silver marks 13 bronze marks · Answer 3 · 2012-12-16T11:06:14

See strtok from string.h.
Here, for example, how many words can be searched:

 unsigned wordsCount( string str, string separators ) { char *word = new char[ str.size() + 1 ]; strcpy( word, str.c_str() ); unsigned count = 0; word = strtok( word, separators.c_str() ); while( word ) { ++count; word = strtok( NULL, separators.c_str() ); } delete [] word; return count; }

Example of use:

 string inputStr = "We think in generalities, but we live in details."; cout << wordsCount( inputStr, ", " );

Note that in the call to strtok first parameter is char* ( NOT const char* ).

@LoOny: (since @gkuznets doesn’t prompt) you are not deleting the pointer that you selected - the bid!

Code optimization

3 answers 3

More articles: