I have a data file that I need to use an input for a program, but I need to tweak the formatting a little. Using this method: Extracting specific data from a file and writing it to another file I generated a file that looks like this:
PITG_00002 2 397
PITG_00004 1 1275
PITG_00004 1397 1969
PITG_00005 200 1111
PITG_00005 1281 1646
PITG_00006 1 816
PITG_00009 2398 3276
PITG_00009 1536 1952
PITG_00010 1 537
I need to distinguish between data that comes from the same sequence (first column) and data that comes from different sequences, by adding a blank line in between sequences that are unique, so that it looks like:
PITG_00002 2 397
PITG_00004 1 1275
PITG_00004 1397 1969
PITG_00005 200 1111
PITG_00005 1281 1646
PITG_00006 1 816
PITG_00009 2398 3276
PITG_00009 1536 1952
PITG_00010 1 537
I tagged this with the program/coding options available to me. Any help you could give is much appreciated, thanks!
This will check the first field
$F[0]against the previous field, stored in$x. If they are not the same, a newline is printed.Explanations:
-pread file and print each line-aautosplit lines on whitespace into@Farray$/is your input record separator, default is newline.