when unloading, there was a bug and after data3 a data3 was added \n and everything went down two lines. data3 framed " (double quotes) file is very large 1 mil lines and in manual is not an option to alter tell me how to remove line breaks through sed in order for the tail to stretch to one line now the file looks like this

 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 

and should be so

 data1,data2,"data3",data4 data1,data2,"data3",data4 data1,data2,"data3",data4 

thank you in advance

    4 answers 4

    Try

     sed 'N;s/\n"/"/' 

    It seems to work for me:

     [VladD@Kenga] [00:59:25] [~] {0,504}$> cat xx.txt data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 [VladD@Kenga] [00:59:32] [~] {0,505}$> sed 'N;s/\n"/"/' xx.txt data1,data2,"data3",data4 data1,data2,"data3",data4 data1,data2,"data3",data4 

    For more complex cases (“ordinary” lines are possible) try this:

     sed '/^",/{H;x;s/\n//;x;d}; x' | sed '1d' 

    Check:

     [VladD@Kenga] [01:35:47] [~] {0,539}$> cat xx.txt header "data1",data2,"data3 ",data4 intermediate data data1,"data2 ","data3 ",data4 data1,data2,"data3 ",data4 [VladD@Kenga] [01:35:52] [~] {0,540}$> sed '/^",/{H;x;s/\n//;x;d}; x' xx.txt header "data1",data2,"data3",data4 intermediate data data1,"data2","data3",data4 data1,data2,"data3",data4 [VladD@Kenga] [01:35:57] [~] {0,541}$> sed '/^",/{H;x;s/\n//;x;d}; x' xx.txt | sed '1d' header "data1",data2,"data3",data4 intermediate data data1,"data2","data3",data4 data1,data2,"data3",data4 

    Attention : the last line must end with a line break , otherwise it will be “swallowed”!


    Explanation: we need, when we see a line starting with quotation marks, to know the previous line in order to glue them. To do this, we “delay” the output of rows, sending them to hold space instead of output, and displaying instead the previous line lying in the same place ( x ).

    For the case when the line starts with a quotation mark ( /^"/ ), we take effect. In hold space is the previous line, dock the current one ( H ) to it, and exchange hold space with the pattern space ( x ) so that the text can be processed. Delete \n ( s/\n// ), and send back the line to the hold space to analyze and output it in the next cycle. The stubs of the string that turned out in the pattern space are deleted, and we complete this iteration ( d ).

    • The last version with ordinary lines worked perfectly, thanks a lot - bandsed
    • @bandsed: Please! - VladD
    • @VladD, great, and probably solves the task of the vehicle (and in general terms, as he formulated it in the question (i.e. if there were several data3 in the source line) unfortunately - no). But it is always easier for me to write a few lines in C in five minutes, than to once again deal with sed on samples, for example, from here . - avp
    • @avp: Thank you! By the way, the idea works with numerous breaks in the line: pastebin.com/MP4JJHV2 . sed interesting as an exercise in very low-level programming, but higher-level tools are, of course, much more efficient to use. - VladD
    • one
      @avp: Yeah, I thought that data3 is a placeholder, so I just look at " at the beginning of the next line. It will work incorrectly if the first element of the line was itself in quotes (then you probably need /^",/ ). - VladD

    if the structure of the resulting file exactly corresponds to the given example (you need to combine 1 and 2 line, 3 and 4, etc.), then the expression can be simplified, approximately as in the next answer :

     $ cat старый.файл | sed 'N;s/\n//' > новый.файл 

    explanation: for all odd lines will be:

    • read the next line to the end of the pattern space
    • the line pattern space character \n between these two lines will be deleted from the pattern space

      It's easier for me to write this on sh (or Si).

      We merge the lines, if after the specified text was inserted \n :

       avp@avp-xub11:hashcode$ cat ts.sh #!/bin/sh IFS="" while read -r s1 do if echo $s1 | grep $1'$' >/dev/null ; then echo -n $s1 else echo $s1 fi done avp@avp-xub11:hashcode$ cat ttt header data1,data2,"data3 ",data4 intermediate data data1,"data2 ","data3 ",data4 data1,data2,"data3 ",data4 data1,"data3 " "data3 "data3 "data4 tailer avp@avp-xub11:hashcode$ ./ts.sh \"data3 < ttt header data1,data2,"data3",data4 intermediate data data1,"data2 ","data3",data4 data1,data2,"data3",data4 data1,"data3" "data3 "data3"data4 tailer avp@avp-xub11:hashcode$ 

      IFS="" causes sh (or bash) not to break a line into words, and the -r key says read that backslash is a regular character (see man 1 read ).

        Can you try the tr command here?

         tr '\n\",' ",' < input_filename 

        sed good but advise tr

        sed version

         sed ':a;N;$!ba;s/\",\n/\",/g' file 
        1. : a create 'a' tag
        2. N add the following line to the format
        3. $! if not the end of the line go to the 'a' label
        4. s substitute, / \ ", n / regex for quotes-comma-new_strings, / \", / replace with quotes-comma, / g is a global replacement (how many times there are, so much work)