There are 2 byte arrays. Pretty big. In both arrays there are all bytes (from 0 to 255).

We need an algorithm that finds matching chains in arrays so that:

  1. The first array was covered with chains of the second without intersections.
  2. Chains should be as small as possible (that is, the longest is in priority).

It is difficult to explain, so I will show by example.

There is a class for storing information about the chain.

public class PartInfo { public long Count { get; set; } public long Offset { get; set; } } 

there are 2 arrays

 var arr1 = new byte[]{2,3,5,2,7,3,9}; var arr2 = new byte[]{2,3,9,5,2,7,8,6,3,9}; 

As a result, I expect to receive PartInfo[] with this content.

 Count | Offset 2 | 0 3 | 3 2 | 8 

so that having the second array and this summary information I was able to fully restore the first one.

Lead time matters

If something is not clear, ask leading questions.

  • Well, what is your question, what do you want to see in the answer? - Kromster
  • @Kromster As a result, I want to get an algorithm. At least just in words ... but if it is in the form of a function on c #, then it's generally great - iRumba
  • You try to explain to us how you think to get it (I still do not understand, and I think not only to me), and the bonus is quite likely that you will already have an algorithm ready. - Sergey
  • 2
    In general, what will it be? Why is this all about? - Sergey
  • one
    Please add these and other details to the body of the question. - Kromster

1 answer 1

This problem is called LCS - "longest common subsequence" (the largest common subsequence), used mainly for searching in strings. The fastest algorithm has complexity O (n * m) , where n is the length of the first sequence, m is the length of the second sequence. For sequences of 1 KB will have more than a million. operations, for files in 1 MB, the number of operations will be already a trillion. Therefore, this is not applicable to files. In the practice of file compression dictionaries are used with small sequences of bits.

  • Let's not forget about optimization ... There are all sorts of trees there, etc. - iRumba
  • The @iRumba tree of segments does not help here. Believe me, everything has come up with us. If it's good with English, you can read all sorts of options here - RusArt
  • No, this task, though similar to LCS, is not equivalent to it. - Pavel Mayorov
  • segment trees may not help, but "and so on" is quite suitable. For example, creating a Dictionary <byte, List <long >> from the second array, where the key is a byte, and the value is a list of all its positions in the array, then the speed increases - iRumba
  • @iRumba is your rises. Because you have an exponential brute force. Algorithm based on dynamic programming such a trick will not help. - Pavel Mayorov