there are 20 files - 1.txt 2.txt 3.txt .... 20.txt with different number of lines (the total number of lines in all 20 ti files is 36 thousand)

and there is one big file also with 36,000 lines - big.txt

Task: substitute lines from a large file at the beginning of lines of all 20 files

example:

in a large line file:

  • Hi Uncle Vanya;
  • while Uncle Vanya;
  • Hello Uncle Vanya;

....etc.

in 20 files lines:

  • the police are good
  • the police are good
  • firemen good

...etc.

As a result, in 20 files should be:

  • hi uncle vanya; militia good
  • while Uncle Vanya; the police are good
  • hello uncle Vanya; firefighters well

...etc.

Additional condition to merge these 20 files can not

------- \ ----- \ -------- protested, it turns out like this:

-строка из big.txt;первая строка файла 01.txt -строка из big.txt;вторая строка файла 01.txt -.... -строка из big.txt;предпоследняя строка файла 01.txt -строка из big.txt;последняя строка файла 01.txtперваястрока файла 02.txt -строка из big.txt;вторая строка файла 02.txt -строка из big.txt;третья строка файла 02.txt -.... -строка из big.txt;предпоследняя строка файла 02.txt -строка из big.txt;последняя строка файла 02.txtперваястрока файла 03.txt -строка из big.txt;вторая строка файла 03.txt -строка из big.txt;третья строка файла 03.txt -.... 
  • But you don't want to solve this problem, say, perl, it is more suitable for thisMike

1 answer 1

Firstly, using the sed program you described is quite difficult to do: you need a very long and non-trivial program.

secondly, to make it easier to work with a sequence of files, it is better to bring their names to the “normal” number format, so that the number of digits in the name is the same. i.e., add 0 to the beginning of the name 1.txt and get 01.txt (etc.):

 $ for f in [0-9].txt; do mv $f 0$f; done 

let's get started

  1. using paste, we will create one large file in which all strings will be merged as you need - separated by a symbol ; :

     $ paste -d ';' big.txt <(sed '$a\' [0-9][0-9].txt) > newbig.txt 
  2. we divide this file into parts containing as many lines as the source small files:

    1. save the total number of lines in a variable:

       $ s=$(sed '$a\' big.txt | wc -l) 
    2. generate a line with a very long command (you can look at its contents after being generated with the command echo $p is a sequence of calls to tee , head and tail ):

       $ p=$(for f in [0-9][0-9].txt; do n=$(sed '$a\' $f | wc -l); echo -n "| tee >(head -n $n > new$f) | tail -n $((sn)) "; done) 
    3. interpret this generated string (with a small addition) using the eval built-in command:

       $ eval "sed '\$a\' newbig.txt $p >/dev/null" 

Now you should have the files new01.txt ... new20.txt with the content you need.


If you need to repeat the same operations, you can put the four commands listed in paragraphs 1 and 2 into the file, and run this file without entering each command separately.


addition about merging last and first lines of files

the merging of the last line of one file and the first line of the next file with cat [0-9][0-9].txt occurs because there is no new line character lf at the end of the files.

You can fix this situation “on the fly” without making corrections to the files themselves, for example, by replacing cat with sed '$a\' .

you can verify the effectiveness, for example, by comparing the end of the hexadecimal output of the file contents as follows:

 $ cat файл | hexdump -C | tail -n 2 

so:

 $ sed '$a\' файл | hexdump -C | tail -n 2 

you will see that in the second case, a symbol with a hexadecimal code 0a will appear at the end of the file (it’s the same linefeed , it’s lf , it’s the “new line character”).

by the way, the number of lines in such files will be considered (by the wc program) incorrectly. so I replaced all cat calls with sed '$a\' above. for reliability.

  • tested, found a problem - when using the paste command creates one file, then in the junction of the 01.txt-20.txt files (between the last line and the first line of the next file) the line from the big.txt file is not substituted, and the last line is connected first line next the file - Beginner
  • because formatting in comments does not allow displaying lists correctly, I added the result that I get to the topic itself above - Beginner
  • @ Human Internet, I have added and corrected the answer. - aleksandr barakin
  • Thanks, now the last command gives an error - $ eval "sed '$ a \' newbig.txt $ p> / dev / null" sed: -e expression # 1, symbol 1: incomplete address regular expression - Novice
  • @Humansinternet, corrected the omission: in this line it is necessary to “extract” $ so that the shell does not try to dereference $a as a variable. - aleksandr barakin