We have a text file of the form:
foo two word bar # cat tea - five o'clock 666 These are not necessarily words, any characters as a matter of fact, there can be several words in one line, incl. separated by spaces. A text file can be large and even huge - up to hundreds of gigabytes or a couple of terabytes.
What I want: give random combinations of a similar list of strings to the required length of the number of rows of the list (see example below)
You can take this script as a basis. It performs all that is needed, but without randomization, that is, it produces a consistent output of lines, going through all the options, starting from the first line, as a brute force generator, and has a built-in function for selecting the number of output lines — starting with the minimum and maximum output lines . Run like this: python3 script.py -f spisokslov.txt -min 2 -max 3 and will have the output:
foo two word foo bar foo # foo cat foo tea - five o'clock foo 666 two word foo two word bar two word # * удалил тут часть строк для сокращения примера, и последняя строка: 666 tea - five o'clock cat The script inserts one space between the lines, but after the last word there is no space in the output line. This is optimal - if you wish, you can then suppress gaps in the pipe. In general, you need all the same thing, just add here a random conclusion, so that the output is not sequential, but chaotic, but with the same functionality of selecting the number of lines: -min 2 -max 3
tea - five o'clock two word # foo foo 666 # cat foo two word bar tea - five o'clock the script produces only combinations without duplicates (if they are of course not in the text file itself), it is desirable to save it, but if it is difficult with randomization, then it is possible without it.
As an alternative, you can use the principle as a basis, like the combinator.bin and combinator3.bin utilities in the Kali linux hashcat-utils set - they iterate, but also sequentially lists 2 or a maximum of 3 files between them: ./combinator3.bin spisokslov1.txt spisokslov2.txt spisokslov3.txt (here -combinator3.c) Maybe it will be easier - we create several different separate text files by the number of combinable combinations, and then randomly select a line from the first list, then randomly from the second and so on , however, then lists for large ones take up a lot of space ...
If the construction with the minimum and maximum number of rows complicates, we can neglect it, it suffices then to choose one fixed length. The script must have its own cycle of issuing lines, essentially to infinity, or indicating the maximum number of lines produced.
In any case, I will be glad to any random option, if someone helps. thank