Tell me how to extract 100,000 random lines from a file, saving it to a new file and deleting it in the original.

There is a "fileA" with 300,000 lines. How to cut 100,000 random lines (cut, i.e., copy by deleting them in “fileA”) and save them in “fileB”.

So that as a result, 200,000 lines remain in fileA, and 100,000 in the new fileB

  • "Cut" - is it "delete" or "extract"? - aleksandr barakin
  • extract removing in source file - Beginner
  • those. need to save separately the specified number of lines and get a new file that no longer contains these lines? you'd better reformulate the question, giving an example: what do you have and what you want to receive as a result. - aleksandr barakin
  • Yes, that's right, or delete it in the original file - Beginner
  • "or"??? still reformulate the text of the question, please, as I suggested in the previous comment. - aleksandr barakin

1 answer 1

answer to the new version of the question:

  1. we mix file lines and we save in the file перемешанные :

     $ shuf файл > перемешанные 
  2. take the specified число lines from the file перемешанные (from the beginning of the file) and save the вырезанные files to the file:

     $ head -n число перемешанные > вырезанные 
  3. we take перемешанные lines from the file, starting with число+1 and save to the original файл :

     $ tail -n +$((число+1)) перемешанные > файл 
  4. Delete the file that is no longer needed:

     $ rm перемешанные 

if you want to keep the same order of lines as in the source file, then you need to add numbering ( nl ), sorting ( sort ) and deleting numbers ( cut ). here without comments, only commands (see details below, in response to the previous version of the question):

 $ nl файл | shuf > перемешанные $ head -n число перемешанные | sort -n | cut -f 2- > вырезанные $ tail -n +$((число+1)) перемешанные | sort -n | cut -f 2- > файл $ rm перемешанные 

The given sequences of commands can be written in one line if you use the pee program from the moreutils package.

  • option if the line order is not important:

     $ shuf файл | pee "head -n число > вырезанные" "tail -n +$((число+1)) > файл" 
  • option if the row order is important:

     $ nl файл | shuf | pee "head -n число | sort -n | cut -f 2- > вырезанные" "tail -n +$((число+1)) | sort -n | cut -f 2- > файл" 

answer to the previous version of the question:

team

 $ shuf файл 

will generate stdout strings from файл a in random order. and the team

 $ shuf -n число файл 

will limit this output to the specified число m lines.

if you want to get the number of lines that makes up the difference between the number of lines in the file and the specified число m, then in the previous command, instead of the число you must substitute the structure that calculates this difference - $(($(cat файл | wc -l) - число)) :

 $ shuf -n $(($(cat файл | wc -l) - число)) файл 

All received can be saved to a new file (it’s alas, it’s not possible), by adding a redirection - > новый.файл :

 $ shuf -n $(($(cat файл | wc -l) - число)) файл > новый.файл 

in the received new file the lines will follow in random order. if you want to keep the same line order as in the source file, you can number the lines beforehand (for example, using the nl program) and then, at the end, sort ( sort ) and delete these numbers (for example, using the cut program) :

 $ nl файл | shuf -n $(($(cat файл | wc -l) - число)) | sort -n | cut -f 2- > новый.файл 
  • since you want to save all lines only in different files, you probably have to do shuf completely, then cut it into 2 parts and restore order - sercxjo
  • @sercxjo, the answer was written before the author of the question "put it on its head." Now I will rewrite the current version of the question. - aleksandr barakin
  • @alexanderbarakin You do not delete this answer, it will also come in handy - Newbie