There are 2 files order_fix.txt and listdir.txt need to check every line from the listdir.txt file listdir.txt check for the same line in order_fix.txt wrote this script:

 end_list = open('end_list.txt','w') listdir = open('listdir.txt') order = open('order_fix.txt') for line in listdir.readlines(): if line in open('order_fix.txt').read(): end_list.write(line) 

but it takes a very long time with ~ 2kk lines in the listdir.txt file more than 5h. Is there any option to speed up this process?

    1 answer 1

    Something like this:

     end_list = open('end_list.txt','w') listdir = open('listdir.txt') order_set = set(open('order_fix.txt').readlines()) for line in listdir.readlines(): if line in order_set: end_list.write(line) 

    You have to re-read the entire order_fix for each line in the listdir. But reading from the disk is quite a difficult operation. Therefore, where possible, you need to try to read the file only once in RAM and work with the data already in memory.

    In addition, the search for occurrences is much faster in set than in the list, so if you need a lot and actively search for a large list, it is better to build a set from it once, and look in it already.

    PS: Please write how much faster my code works compared to yours, I'm curious))

    UPD: I still do not see in your code where you are closing files. Are you doing it? If not, then files should always be closed after you have finished working with them. And it is better to do it wherever possible, to work with files only with the help of a context manager.