How to remove lines from another file from one file?

Question

As from file1 , containing lines of the form почта:имя:фамилия:возраст , delete those lines that are contained in file2 , where just почта is specified

I found a solution, but I don’t like it, it’s too long and it writes the file a million times - I'm afraid I'll kill it))

 for a in `cat файл2`; do sed -i -e "/$a/d" файл1; done

I tried - for some reason this was also not the case - I began to create a file and its size grew like a shiver
@donRumata, then the result will get many times duplicated lines from the source file.

Answer 1 · 2017-02-11T12:11:26

can be done in one pass through файл1 :

 $ sed -i -f <(...) файл1

where ... is a command that generates a list of regular expressions. regular expressions can use your same - /искомая строка/d , or you can slightly clarify - /^искомая строка:/d - add a binding to the beginning of the string and to the symbol :

Ie, instead of ... suggest using something like sed 's,^,/^,;s,$,:/d,' файл2

total:

 $ sed -i -f <(sed 's,^,/^,;s,$,:/d,' файл2) файл1

if the files have the following lines:

 $ cat файл1 почта1:имя1:фамилия1:возраст1 почта2:имя2:фамилия2:возраст2 почта3:имя3:фамилия3:возраст3 почта4:имя4:фамилия4:возраст4 почта5:имя5:фамилия5:возраст5 $ cat файл2 почта2 почта5

the result will be:

 $ sed -i -f <(sed 's,^,/^,;s,$,:/d,' файл2) файл1 $ cat файл1 почта1:имя1:фамилия1:возраст1 почта3:имя3:фамилия3:возраст3 почта4:имя4:фамилия4:возраст4

just in case, I’ll clarify: when using the -i option of sed, a temporary file is actually created after all, and the results of processing the source file come into it. and at the end of processing, the original file is deleted, and the temporary file is renamed to the “original” name.

I'm testing, damn how long, file1 = 1gig file2 = 300 meters, I understand that it’s not fast, it’s embarrassing, why does it create a file that grows 1 kilobyte per second, what kind of file is it?
@ 3amunyk, I specifically described at the end of the answer the meaning of the option -i
@ 3amunyk, I suspect there are duplicates in the second file.
it is more optimal to remove them before generating regular expressions: $ sed -i -f <(sort -u файл2 | sed 's,^,/^,;s,$,:/d,') файл1
$ sed -i -f <(sort -u файл2 | sed 's,^,/^,;s,$,:/d,') файл1
even if there are duplicates, do you think that their absence will cope with such a volume faster?
-i is the equivalent of sed -e '...' < in > out && mv -f out in .
Although, it is necessary to run strace sed -i ... and watch what is called in what sequence.
In different operating systems, interesting for an hour system calls appear, and the file systems themselves do not stand still.

How to remove lines from another file from one file?

1 answer 1

More articles: