Deletion of hyphenation characters with line return

Question

when unloading, there was a bug and after data3 a data3 was added \n and everything went down two lines. data3 framed " (double quotes) file is very large 1 mil lines and in manual is not an option to alter tell me how to remove line breaks through sed in order for the tail to stretch to one line now the file looks like this

 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4

and should be so

 data1,data2,"data3",data4 data1,data2,"data3",data4 data1,data2,"data3",data4

thank you in advance

Accepted Answer · 2015-06-21T23:01:17

Try

 sed 'N;s/\n"/"/'

It seems to work for me:

 [VladD@Kenga] [00:59:25] [~] {0,504}$> cat xx.txt data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 data1,data2,"data3 ",data4 [VladD@Kenga] [00:59:32] [~] {0,505}$> sed 'N;s/\n"/"/' xx.txt data1,data2,"data3",data4 data1,data2,"data3",data4 data1,data2,"data3",data4

For more complex cases (“ordinary” lines are possible) try this:

 sed '/^",/{H;x;s/\n//;x;d}; x' | sed '1d'

Check:

 [VladD@Kenga] [01:35:47] [~] {0,539}$> cat xx.txt header "data1",data2,"data3 ",data4 intermediate data data1,"data2 ","data3 ",data4 data1,data2,"data3 ",data4 [VladD@Kenga] [01:35:52] [~] {0,540}$> sed '/^",/{H;x;s/\n//;x;d}; x' xx.txt header "data1",data2,"data3",data4 intermediate data data1,"data2","data3",data4 data1,data2,"data3",data4 [VladD@Kenga] [01:35:57] [~] {0,541}$> sed '/^",/{H;x;s/\n//;x;d}; x' xx.txt | sed '1d' header "data1",data2,"data3",data4 intermediate data data1,"data2","data3",data4 data1,data2,"data3",data4

Attention : the last line must end with a line break , otherwise it will be “swallowed”!

Explanation: we need, when we see a line starting with quotation marks, to know the previous line in order to glue them. To do this, we “delay” the output of rows, sending them to hold space instead of output, and displaying instead the previous line lying in the same place ( x ).

For the case when the line starts with a quotation mark ( /^"/ ), we take effect. In hold space is the previous line, dock the current one ( H ) to it, and exchange hold space with the pattern space ( x ) so that the text can be processed. Delete \n ( s/\n// ), and send back the line to the hold space to analyze and output it in the next cycle. The stubs of the string that turned out in the pattern space are deleted, and we complete this iteration ( d ).

The last version with ordinary lines worked perfectly, thanks a
@VladD, great, and probably solves the task of the vehicle (and in general terms, as he formulated it in the question (i.e. if there were several data3 in the source line) unfortunately - no).
But it is always easier for me to write a few lines in C in five minutes, than to once again deal with sed on samples, for example, from here .
By the way, the idea works with numerous breaks in the line: pastebin.com/MP4JJHV2 .
sed interesting as an exercise in very low-level programming, but higher-level tools are, of course, much more efficient to use.
@avp: Yeah, I thought that data3 is a placeholder, so I just look at " at the beginning of the next line. It will work incorrectly if the first element of the line was itself in quotes (then you probably need /^",/ ).

Community spirit ♦ one · Answer 2 · 2015-06-21T23:01:36

if the structure of the resulting file exactly corresponds to the given example (you need to combine 1 and 2 line, 3 and 4, etc.), then the expression can be simplified, approximately as in the next answer :

 $ cat старый.файл | sed 'N;s/\n//' > новый.файл

explanation: for all odd lines will be:

read the next line to the end of the pattern space
the line pattern space character \n between these two lines will be deleted from the pattern space

Answer 3 · 2015-06-22T12:45:28

It's easier for me to write this on sh (or Si).

We merge the lines, if after the specified text was inserted \n :

 avp@avp-xub11:hashcode$ cat ts.sh #!/bin/sh IFS="" while read -r s1 do if echo $s1 | grep $1'$' >/dev/null ; then echo -n $s1 else echo $s1 fi done avp@avp-xub11:hashcode$ cat ttt header data1,data2,"data3 ",data4 intermediate data data1,"data2 ","data3 ",data4 data1,data2,"data3 ",data4 data1,"data3 " "data3 "data3 "data4 tailer avp@avp-xub11:hashcode$ ./ts.sh \"data3 < ttt header data1,data2,"data3",data4 intermediate data data1,"data2 ","data3",data4 data1,data2,"data3",data4 data1,"data3" "data3 "data3"data4 tailer avp@avp-xub11:hashcode$

IFS="" causes sh (or bash) not to break a line into words, and the -r key says read that backslash is a regular character (see man 1 read ).

Saidolim saidolim 7,918 3 gold marks 19 silver marks 39 bronze marks · Answer 4 · 2015-06-21T23:00:48

Can you try the tr command here?

 tr '\n\",' ",' < input_filename

sed good but advise tr

sed version

 sed ':a;N;$!ba;s/\",\n/\",/g' file

: a create 'a' tag
N add the following line to the format
$! if not the end of the line go to the 'a' label
s substitute, / \ ", n / regex for quotes-comma-new_strings, / \", / replace with quotes-comma, / g is a global replacement (how many times there are, so much work)

Deletion of hyphenation characters with line return

4 answers 4

More articles: