There is a folder containing several thousand files of the same type, but with different names and sizes. When you run the program, you need to enter some size in MB (for example, 10 MB). During program execution, files from the source folder are randomly copied from to the specified folder until the size of the specified folder becomes larger or equal to the size value entered earlier (for example, the 10 MB).
- fourWell, actually, as written in the question, exactly in the code and write, everything is described correctly and the code is translated very easily. What exactly are the problems that have required to create this question? - andreymal
- (although there is one nuance, but it is insignificant and will be discussed later) - andreymal
- @andreymal what exactly is the nuance? - MichaelPak
- one@MichaelPak whether to consider the size of the directory by the sum of the bytes in the files or by the clusters occupied on the FS. For if you type 10 megabytes of files, for example, 512 bytes, then you will actually be busy about 80 megabytes if clusters are 4 kilobytes each .) - andreymal
|
2 answers
Something like this:
import os, random, shutil old_dir = "old/" # старая директория new_dir = "new/" # новая директория size = 10 * 1024 * 1024 # getsize() возвращает байты file_list = os.listdir(old_dir) # список файлов с старой директории while sum(os.path.getsize(f) for f in os.listdir(new_dir)) < size: file = file_list.pop(random.randint(0, len(file_list))) shutil.copy(old_dir + file, new_dir + file) - 1- This will not work:
os.listdir()returns the relative path 2-L.pop(randint(0, len(L)))will not work becauselen(L)incorrect index. Canrandrange()be used. Or it is simpler toshuffle(L)call 3 once. This isO(n*k)(quadratic) algorithm (nis the number of files in the old directory,kis the number of files in the new directory). It can be improved toO(n)linear algorithm - jfs - @jfs, firstly, you can simply run the script from the root directory. And secondly: yes, your version is much more productive, +1;) - MichaelPak
|
To copy randomly selected * .type files from the source folder to the destination folder, so that the total size of the copied files does not exceed the limit in bytes specified in the command line ( limit = 10MiB by default):
#!/usr/bin/env python3 """Usage: copy-random-files [<size-limit-MiB>]""" import random import shutil import sys from pathlib import Path MiB = 1024 * 1024 limit = (int(sys.argv[1]) if len(sys.argv) > 1 else 10) * MiB size = 0 paths = list(Path("source").glob("*.type")) random.shuffle(paths) for path in paths: size += path.stat().st_size if size > limit: break shutil.copy(str(path), str(Path("destination") / path.name)) - To take all entries from the folder indiscriminately, you can use
Path("source").iterdir()instead ofPath("source").glob("*.type") random.shuffle()paths in random order- The folder size is calculated by the sum of the sizes of the files it contains in this case. Actual disk space may vary. See Find the total size of all regular files in a directory, recursively bypassing all subdirectories.
This algorithm is O(n) linear in memory and time, where n is the total number of *.type in the source folder.
- “
set()guarantees a random order every time the program is started” - I’m not sure that it’s good to be so brazenly tied to a specific implementation - PyPy3 mixes up the same way every time (although I didn’t find the pathlib in it, but that's another story) - andreymal - @andreymal: pypy well supports pure Python 2 code. Python 3 support, which my answer obviously uses is bad in it. This is a Pypy problem (my code does not use the details of the CPython implementation). You can easily write a response that will work both on Python 2 and 3 (for example, using
random.shuffle()). The purpose of the code to be executable pseudo-code in this case (it is enough that it is read and works on standard Python). - jfs - For “executable pseudocode”, this is all the more bad. - andreymal
- 1- "this is all the more bad" - what do you mean - point by point. 2- "Like the use of <<" - therefore
10MiBnext to indicated for beginners (for experienced developers1 << 20and so read). - jfs - I mean, if you write a pseudo-code, and not a production-ready super-version, then you need to write it as clearly and clearly as possible - andreymal
|