I have a text file starts with 9 digits college code and ends with of 5 digits course code.
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
There are some entries where there is a line break as shown in the 3 example above.
I need to merge 3rd and 4th line into one just like 1st and 2nd line, so that I can easily use command like grep, awk etc.
Update:
Kevin’s answer does not seem to work.
cat todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
cat todel.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }'
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531ege of,
Regarding split lines: This
sedscript assumes that you have at least one space after the leading number (on the first line of the split), and one space before the trailing number (on the last line of the split), and that there is only one split per split line.Modified to accept input with Windows CRLF newlines or *nix LF. but note that the output is a *nix
\nor, shorter, but perhaps less readable:
I do expect that the first one is faster, because the most frequent test (for full lines) involves just a single regex, whereas the second (shorter) script, need two regex tests for the most frequent test.
This it the output I get; using
GNU sed 4.2.1