I have a directory in which there are about 5 more directories. Each of these directories contains a huge number of files. I need to check the file descriptor of each file, and then take the necessary actions, such as deleting a file.

It happens on a working server and the processor is very heavily loaded when I do it in an oak way, like this:

[self.handle_file(path) for path in glob.iglob(PATH + '/*/*')] 

How can this be done more optimally?

  • one
    In the general case (when the system is busy with IO calls — unlike the case in question), for example, if there are a million files or files in a network folder, then os.scandir() can help. Here are some examples of usage . - jfs

1 answer 1

Never use a list inclusion to perform any action! First, it is an anti-pattern, and secondly, it is very inefficient. In your particular case, Python creates in memory a list of paths to all the files that .iglob() returned, and will keep it there until it goes through all the elements.

  • those. instead of square brackets, use round ones, then a generator will be created that will return values ​​only when really needed - gil9red
  • one
    Better yet, take advantage of the classic loop. Inclusions are designed to generate data without side effects. - Sergey Gornostaev
  • @SergeyGornostaev, did not understand about the anti-pattern. Is list comprehension an antipattern in general or how did I use it? - faoxis
  • 3
    @SergeyGornostaev, yeah, and still works well with os.walk :) - gil9red
  • one
    @andreymal: os.walk() limited to its interface for VERY large directories (forced to return lists for the entire directory at once) —that is rarely important in practice. For your particular needs, you can change the API and implementation, for example, generate the parent, entry ( os.DirEntry type) pairs instead of top, dirs, nondirs although in particular cases (as in the link I mentioned above), the code can be even simpler (supporting huge directories) - jfs