It is desirable on Delfi. There are several files (more than two) in which you need to make a comparison, and display a list of matching lines. what is needed is a list of strings found in at least two files and a list of files in which this string is found

Example:

Файл 1: aaa bbb ccc ddd Файл 2: aaa xxx yyy zzz Файл 3: aaa bbb rrr Результаты: aaa - Файл1,Файл2,Файл3 bbb - Файл1, Файл3 

I tried to write myself, but in the evening the Mosk does not understand already ... I will be grateful to any ideas.

  • 1. For each file, we build an array with all its strings 2. We blink arrays with a counter - timka_s

2 answers 2

The algorithm will work for a long time on large amounts of data. It is better to use the hash function for strings, and then work with arrays of numbers.

  • Here is the correct remark. I also note that uploading files into memory completely is very bad . Naturally, only hashes and the compact information associated with them (file, offset) need to be stored in memory. Read and compare only when hash matches. For greater efficiency, you can make an in-memory cache in which lines with a matching hash will be placed. - avp
  • well said, but neither about the hash, nor about the file download in parts, I have no clue)) - Vladyslav Matviienko

I answer my question for the second time in a row) the answer comes as soon as I ask.

Here is such an algorithm - I collect text from all files into one TStringList, sort, delete duplicates;

Next, I start a cycle through the lines of this string player, and look for this line in each file (after loading it into a string player, sorting it, removing duplicates). If more than one match is found (in more than one file), after checking for the presence of this string in the files, I output the string and the list of files in the resulting memo.

And here is the code itself:

 procedure TForm2.Button1Click(Sender: TObject); var all,f1,f2,result:TStringList; i,j,l,m,n:integer; resstr:string; rescount:integer; begin rescount:=0; all:=TStringList.Create; f1:=TStringList.Create; f2:=TStringList.Create; result:=TStringList.Create; all.Sorted:=true; all.Duplicates:=dupIgnore; f1.Sorted:=True; f1.Duplicates:=dupIgnore; for n := 0 to memo1.Lines.Count-1 do begin f1.LoadFromFile(memo1.Lines[n]); all.AddStrings(f1); end; for j := 0 to all.Count-1 do begin for I := 0 to memo1.Lines.Count-1 do begin f1.LoadFromFile(memo1.Lines[i]); if (f1.IndexOf(all.Strings[j])>-1) then begin resstr:=resstr+' '+extractfilename(memo1.Lines[i]); inc(rescount); end; end; if rescount>1 then result.Add(all.Strings[j]+' '+resstr); resstr:=''; rescount:=0; end; memo2.Lines.AddStrings(result); end; 

</ code>