Work with a large number of files

Question

I have a directory in which there are about 5 more directories. Each of these directories contains a huge number of files. I need to check the file descriptor of each file, and then take the necessary actions, such as deleting a file.

It happens on a working server and the processor is very heavily loaded when I do it in an oak way, like this:

[self.handle_file(path) for path in glob.iglob(PATH + '/*/*')]

How can this be done more optimally?

In the general case (when the system is busy with IO calls — unlike the case in question), for example, if there are a million files or files in a network folder, then os.scandir() can help.

Sergey Gornostaev Sergey Gornostaev 53.3k 6 28 66 · Answer 1 · 2016-10-11T08:10:41

Never use a list inclusion to perform any action! First, it is an anti-pattern, and secondly, it is very inefficient. In your particular case, Python creates in memory a list of paths to all the files that .iglob() returned, and will keep it there until it goes through all the elements.

Sergey Gornostaev

53.3k 6 28 66

those. instead of square brackets, use round ones, then a generator will be created that will return values only when really needed - gil9red
one
Better yet, take advantage of the classic loop. Inclusions are designed to generate data without side effects. - Sergey Gornostaev
@SergeyGornostaev, did not understand about the anti-pattern. Is list comprehension an antipattern in general or how did I use it? - faoxis
3
@SergeyGornostaev, yeah, and still works well with os.walk :) - gil9red
one
@andreymal: os.walk() limited to its interface for VERY large directories (forced to return lists for the entire directory at once) —that is rarely important in practice. For your particular needs, you can change the API and implementation, for example, generate the parent, entry ( os.DirEntry type) pairs instead of top, dirs, nondirs although in particular cases (as in the link I mentioned above), the code can be even simpler (supporting huge directories) - jfs

|

Work with a large number of files

1 answer 1

More articles: