I am writing now a task for which you need to be able to quickly find the largest common prefix of two lines.

I decided to do this using a binary search: we look through the length of the prefix and if the hash of this prefix for two lines is the same, then we move the borders to the right to find the greater length of the prefix, and otherwise we move to the left to see the smaller length of the prefix.

The polynomial hash just allows you to find the hash of the entire string first, and then find the hash of the substring after O (1), which is convenient for comparing these very substrings.

But there is a problem: for some reason, my binary search finds the wrong prefix length and I can’t find it out.

What is the problem?

Prefix search:

int d(string &a, string &b) { StrHash ha(a), hb(b); int mid, l = 0, r = min(a.size(), b.size()) - 1; while(l < r) { mid = (l + r) / 2; if(ha.get(0, mid) == hb.get(0, mid)) { l = mid + 1; } else { r = mid - 1; } } int common; ///А тут мы проверяем, нашли мы префикс или нет. if(ha.get(0, l) == hb.get(0, l)) common = l + 1; else common = 0; return common; } 

A test with a bug that finds the answer 0 :

 s1 = 'aaaaa' s2 = 'ab' 
  • So how long does it take to initialize your hash, so that you can "find the hash of a substring" after O (1)? If hash initialization requires a linear pass of the entire line, then what's the point in all this, if you could immediately find a prefix with a linear pass? - AnT
  • And, I see the discussion below ... That is, there is no substantiated answer, except for “I believe” ... Refuse. Competently implemented direct comparison will work much faster. - AnT

1 answer 1

I found a solution. I think it’s more like a crutch, but still.

I suggested that l at the end of the binary search algorithm may be more than necessary, I really do not know, in any case, only by one. I suggested this based on the fact that in a more classical binary search, the answer can be either l = m + 1 or simply m if the answer was found before leaving the while .

Replaced in the end it:

 if(ha.get(0, l) == hb.get(0, l)) common = l + 1; else common = 0; 

On this:

 int common; int variant1 = l, variant2 = l - 1; if(ha.get(0, variant1) == hb.get(0, variant1)) { common = variant1 + 1; } else { if(l - 1 >= 0 && ha.get(0, variant2) == hb.get(0, variant2)) { common = variant2 + 1; } else { common = 0; } } 
  • Comments are not intended for extended discussion; conversation moved to chat . - Yuriy SPb