I'm looking for the fastest algorithm for finding a string in an array of strings, which is the most different from the first row of an array. The strings are all the same length. The position of the characters and their ASCII code is of fundamental importance. If you solve the problem in the forehead, then you need to go through each line of the array and compare each of its characters with the character of the first line with the same index. If they are different, the counter for this row in the array is incremented. The largest counter gives us the string that is most different from the first.

  • 6
    And much more likely you will fail. You still need to compare all lines with the first. The only minor improvement is that you can remember the current “longest” distance, and if for the next line you see that it cannot be reached (too many identical characters), stop scanning this line. - VladD
  • I forgot to clarify, all the lines are of the same length and the position of the characters in it is fundamentally important. - Vyacheslav Dmitryukov
  • Clarify this by editing the question. Most here do not read other people's comments. - Max ZS
  • Specify the criterion for the difference lines. - Cerbo 8:36 pm
  • It seems from the task it is clear that it is important for me that the characters with the same index do not match? Or suggest how to reformulate the question itself so that it is understandable. - Vyacheslav Dmitryukov

1 answer 1

You can use the following algorithm:

std::vector<std::string> strings ... std::vector<size_t> distances; for(size_t i = 1; i < strings.size(); ++i) { std::deque<bool> tmp(strings[i].size()); for(size_t j = 0; j < tmp.size(); ++j) tmp[j] = static_cast<char>(strings[i][j] == strings[0][j]); auto distance = count(begin(tmp), end(tmp), false); distances.push_back(distance); } auto idx = distance(begin(distances), max_element(begin(distances), end(distances))) + 1; auto result = strings[idx]; 

This code is very easy to rewrite for vector instructions and its speed can grow many times, but for this you need to know the size of the lines (you know it, I don’t)

  • In my project, the length of these lines is given by a constant which is a multiple of three and four. The magnitude of the order of 12,000 characters. The exact length is still undecided. Tell me why you need to know the size of the lines, and if it's not difficult for you, give an example of using vector instructions? - Vyacheslav Dmitryukov
  • @ Vyacheslav Dmitryukov, it depends on the architecture on which the program will be used (I mean the vector instructions used). The length is important because vector instructions work with lengths that are multiples of 2 (not all, of course), so you need to divide the string into pieces that can be processed with vector instructions. - ixSci