I have a text file which is tab separated. They can be quite big

Question

0

Asked: May 26, 20262026-05-26T20:01:18+00:00 2026-05-26T20:01:18+00:00

I have a text file which is tab separated. They can be quite big

0

I have a text file which is tab separated. They can be quite big upto 1 GB. I will have variable number of columns depending on the number of sample in them. Each sample have eight columns.For example, sampleA : ID1, id2, MIN_A, AVG_A, MAX_A,AR1_A,AR2_A,AR_A,AR_5. Of which the ID1, and id2 are the common to all the samples. What I want to achieve is split the whole file in to chunks of files depending on the number of samples.

ID1,ID2,MIN_A,AVG_A,MAX_A,AR1_A,AR2_A,AR3_A,AR4_A,AR5_A,MIN_B, AVG_B, MAX_B,AR1_B,AR2_B,AR3_B,AR4_B,AR5_B,MIN_C,AVG_C,MAX_C,AR1_C,AR2_C,AR3_C,AR4_C,AR5_C
12,134,3535,4545,5656,5656,7675,67567,57758,875,8678,578,57856785,85587,574,56745,567356,675489,573586,5867,576384,75486,587345,34573,45485,5447
454385,3457,485784,5673489,5658,567845,575867,45785,7568,43853,457328,3457385,567438,5678934,56845,567348,58567,548948,58649,5839,546847,458274,758345,4572384,4758475,47487

This is how my model file looks, I want to have them as :

File A : 
ID1,ID2,MIN_A,AVG_A,MAX_A,AR1_A,AR2_A,AR3_A,AR4_A,AR5_A
12,134,3535,4545,5656,5656,7675,67567,57758,875
454385,3457,485784,5673489,5658,567845,575867,45785,7568,43853

File B:
ID1, ID2,MIN_B, AVG_B, MAX_B,AR1_B,AR2_B,AR3_B,AR4_B,AR5_B
12,134,8678,578,57856785,85587,574,56745,567356,675489
454385,3457,457328,3457385,567438,5678934,56845,567348,58567,548948

File C:

ID1, ID2,MIN_C,AVG_C,MAX_C,AR1_C,AR2_C,AR3_C,AR4_C,AR5_C
12,134,573586,5867,576384,75486,587345,34573,45485,5447
454385,3457,58649,5839,546847,458274,758345,4572384,4758475,47487.

Is there any easy way of doing this than going thorough an array?

How I have worked out my logic is counting the (number of headers – 2) and dividing them by 8 will give me the number of Samples in the file. And then going through each element in an array and to parse them . Seems to be a tedious way of doing this. I would be happy to know any simpler way of handling this.

Thanks
Sipra

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T20:01:19+00:00

#!/bin/env perl

use strict;
use warnings;

# open three output filehandles
my %fh;
for (qw[A B C]) {
  open $fh{$_}, '>', "file$_" or die $!;
}

# open input
open my $in, '<', 'somefile' or die $!;

# read the header line. there are no doubt ways to parse this to
# work out what the rest of the program should do.
<$in>;

while (<$in>) {
  chomp;
  my @data = split /,/;

  print $fh{A} join(',', @data[0 .. 9]), "\n";
  print $fh{B} join(',', @data[0, 1, 10 .. 17]), "\n";
  print $fh{C} join(',', @data[0, 1, 18 .. $#data]), "\n";
}

Update: I got bored and made it cleverer, so it automatically handles any number of 8-column records in a file. Unfortunately, I don’t have time to explain it or add comments.

#!/usr/bin/env perl

use strict;
use warnings;

# open input
open my $in, '<', 'somefile' or die $!;

chomp(my $head = <$in>);
my @cols = split/,/, $head;

die 'Invalid number of records - ' . @cols . "\n"
  if (@cols -2) % 8;

my @files;
my $name = 'A';
foreach (1 .. (@cols - 2) / 8) {
   my %desc;
   $desc{start_col} = (($_ - 1) * 8) + 2;
   $desc{end_col}   = $desc{start_col} + 7;
   open $desc{fh}, '>', 'file' . $name++ or die $!;
   print {$desc{fh}} join(',', @cols[0,1],
                               @cols[$desc{start_col} .. $desc{end_col}]),
                     "\n";

   push @files, \%desc;
}

while (<$in>) {
  chomp;
  my @data = split /,/;

  foreach my $f (@files) {
    print {$f->{fh}} join(',', @data[0,1],
                               @data[$f->{start_col} .. $f->{end_col}]),
                   "\n";
   }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text file which is tab separated. They can be quite big

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply