Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8064873
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T11:29:46+00:00 2026-06-05T11:29:46+00:00

I have an input file with the following format ant,1 bat,1 bat,2 cat,4 cat,1

  • 0

I have an input file with the following format

ant,1
bat,1
bat,2
cat,4
cat,1
cat,2
dog,4

I need to aggregate the col2 for each key (column1) so the result is:

ant,1
bat,3
cat,7
dog,4

Other considerations:

  1. Assume that the input file is sorted
  2. The input file is pretty large (about 1M rows), so I don’t want to use an array and take up memory
  3. Each input line should be processed as we read it, and move to the next line
  4. I need to write the results to an outFile
  5. I need to do this in Perl, but a pseudo-code or algorithm would help just as fine

Thanks!

This is what I came up with… want to see if this can be written better/elegant.

open infile, outFile

prev_line = <infile>;
print_line = $prev_line;

while(<>){
   curr_line = $_;

   @prev_cols=split(',', $prev_line);
   @curr_cols=split(',', $curr_line);

   if ( $prev_cols[0] eq $curr_cols[0] ){
      $prev_cols[1] += curr_cols[1];
      $print_line = "$prev_cols[0],$prev_cols[1]\n";
      $print_flag = 0;
   }
   else{
      $print outFile "$print_line";
      $print_flag = 1;
      $print_line = $curr_line;
   }
   $prev_line = $curr_line;
}

if($print_flag = 1){
   print outFile "$curr_line";
}   
else{
   print outFile "$print_line";
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T11:29:46+00:00Added an answer on June 5, 2026 at 11:29 am
    #!/usr/bin/perl
    use warnings;
    use strict;
    use integer;
    
    my %a;
    while (<>) {
        my ($animal, $n) = /^\s*(\S+)\s*,\s*(\S+)/;
        $a{$animal} += $n if defined $n;
    }
    print "$_,${a{$_}}\n" for sort keys %a;
    

    This short code affords you the chance to learn Perl’s excellent hash facility, as %a. Hashes are central to Perl. One really cannot write fluent Perl without them.

    Observe incidentally that the code exercises Perl’s interesting autovivification feature. The first time a particular animal is encountered in the input stream, no count exists, so Perl implicitly assumes a pre-existing count of zero. Thus, the += operator does not fail, even though it seems that it should. It just adds to zero in the first instance.

    On the other hand, it may happen that not only the number of data but the number of animals is so large that one would not like to store the hash %a. In this case, one can still calculate totals, provided only that the data are sorted by animal in the input, as they are in your example. In this case, something like the following might suit (though regrettably it is not nearly so neat as the above).

    #!/usr/bin/perl
    use warnings;
    use strict;
    use integer;
    
    my $last_animal = undef;
    my $total_for_the_last_animal = 0;
    
    sub start_new_animal ($$) {
        my $next_animal = shift;
        my $n = shift;
        print "$last_animal,$total_for_the_last_animal\n"
          if defined $last_animal;
        $last_animal = $next_animal;
        $total_for_the_last_animal = $n;
    }
    
    while (<>) {
        my ($animal, $n) = /^\s*(\S+)\s*,\s*(\S+)/;
        if (
            defined($n) && defined($animal) && defined($last_animal)
              && $animal eq $last_animal
        ) { $total_for_the_last_animal += $n; }
        else { start_new_animal $animal, $n; }
    }
    start_new_animal undef, 0;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's say I have an input text file of the following format: Section1 Heading
Please help with the following: I have an input file that is quite heterogeneous
I have the following code to open a input and output file: if ((source_file_ptr
In an HTML file I have the following: <input type=... name=myInput1 /> In a
I have the following target: <target name=promptforchoice> <input addproperty=choice> Copy the file?. [Y, n]
I have the file in following format: 0,0.104553357966 1,0.213014562052 2,0.280656379048 3,0.0654249076288 4,0.312223429689 5,0.0959008911106 6,0.114207780917
Supposedly I have this line in my text file which has the following format.
I have a small question. I have a file in the following format: 1
In my HTML file I have the following to take input in the following
I have a text file with several lines in the following format: gatename #outputs

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.