There is a folder containing several thousand files of the same type, but with different names and sizes. When you run the program, you need to enter some size in MB (for example, 10 MB). During program execution, files from the source folder are randomly copied from to the specified folder until the size of the specified folder becomes larger or equal to the size value entered earlier (for example, the 10 MB).

  • four
    Well, actually, as written in the question, exactly in the code and write, everything is described correctly and the code is translated very easily. What exactly are the problems that have required to create this question? - andreymal
  • (although there is one nuance, but it is insignificant and will be discussed later) - andreymal
  • @andreymal what exactly is the nuance? - MichaelPak
  • one
    @MichaelPak whether to consider the size of the directory by the sum of the bytes in the files or by the clusters occupied on the FS. For if you type 10 megabytes of files, for example, 512 bytes, then you will actually be busy about 80 megabytes if clusters are 4 kilobytes each .) - andreymal

2 answers 2

Something like this:

import os, random, shutil old_dir = "old/" # старая директория new_dir = "new/" # новая директория size = 10 * 1024 * 1024 # getsize() возвращает байты file_list = os.listdir(old_dir) # список файлов с старой директории while sum(os.path.getsize(f) for f in os.listdir(new_dir)) < size: file = file_list.pop(random.randint(0, len(file_list))) shutil.copy(old_dir + file, new_dir + file) 
  • 1- This will not work: os.listdir() returns the relative path 2- L.pop(randint(0, len(L))) will not work because len(L) incorrect index. Can randrange() be used. Or it is simpler to shuffle(L) call 3 once. This is O(n*k) (quadratic) algorithm ( n is the number of files in the old directory, k is the number of files in the new directory). It can be improved to O(n) linear algorithm - jfs
  • @jfs, firstly, you can simply run the script from the root directory. And secondly: yes, your version is much more productive, +1;) - MichaelPak

To copy randomly selected * .type files from the source folder to the destination folder, so that the total size of the copied files does not exceed the limit in bytes specified in the command line ( limit = 10MiB by default):

 #!/usr/bin/env python3 """Usage: copy-random-files [<size-limit-MiB>]""" import random import shutil import sys from pathlib import Path MiB = 1024 * 1024 limit = (int(sys.argv[1]) if len(sys.argv) > 1 else 10) * MiB size = 0 paths = list(Path("source").glob("*.type")) random.shuffle(paths) for path in paths: size += path.stat().st_size if size > limit: break shutil.copy(str(path), str(Path("destination") / path.name)) 

This algorithm is O(n) linear in memory and time, where n is the total number of *.type in the source folder.

  • set() guarantees a random order every time the program is started” - I’m not sure that it’s good to be so brazenly tied to a specific implementation - PyPy3 mixes up the same way every time (although I didn’t find the pathlib in it, but that's another story) - andreymal
  • @andreymal: pypy well supports pure Python 2 code. Python 3 support, which my answer obviously uses is bad in it. This is a Pypy problem (my code does not use the details of the CPython implementation). You can easily write a response that will work both on Python 2 and 3 (for example, using random.shuffle() ). The purpose of the code to be executable pseudo-code in this case (it is enough that it is read and works on standard Python). - jfs
  • For “executable pseudocode”, this is all the more bad. - andreymal
  • 1- "this is all the more bad" - what do you mean - point by point. 2- "Like the use of <<" - therefore 10MiB next to indicated for beginners (for experienced developers 1 << 20 and so read). - jfs
  • I mean, if you write a pseudo-code, and not a production-ready super-version, then you need to write it as clearly and clearly as possible - andreymal