Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1898666
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T06:47:38+00:00 2026-05-17T06:47:38+00:00

My task is to create a parent-child hierarchy file using perl. Sample Input file

  • 0

My task is to create a parent-child hierarchy file using perl.

Sample Input file (tab delimited). The records would be arranged in the file in random order and the “parent” may appear after the “child”.

 S5 S3
 S5 S8
 ROOT   S1
 S1 S7
 S2 S5
 S3 S4
 S1 S2
 S4 77
 S2 S9
 S3 88

Sample Output File (tab delimited)

ROOT    S1  S2  S5  S3  S4  77
ROOT    S1  S2  S5  S3  88
ROOT    S1  S7
ROOT    S1  S2  S5  S8
ROOT    S1  S2  S9

The code which produces the above output file

use strict;

# usage: perl parent_child_generator.pl input.txt output.txt

my $input0=$ARGV[0] or die "must provide input.txt as the first argument\n";
my $output1=$ARGV[1] or die "must provide output.txt as the second argument\n";

open(IN0,"<",$input0) || die "Cannot open $input0 for reading: $!";
open(OUT1,">",$output1) || die "Cannot open $output1 for writing: $!";

sub trim
{
    my $string=shift;
$string=~s/\r?\n$//;
$string=~s/^\s+//;
$string=~s/\s+$//;
return $string;
}

sub connectByPrior
{
my $in_child=$_[0];
my %in_hash=%{$_[1]};
my @anscestor_arr;

for (sort keys %in_hash)
{
    my $key=$_;
    my @key_arr=split(/\t/,$key);
    my $parent=$key_arr[0];
    my $child=$key_arr[1];

    if ($in_child eq $child)
    {
        push (@anscestor_arr,$parent);
        @anscestor_arr=(@{connectByPrior($parent,\%in_hash)},@anscestor_arr);
        last;
    }
}
return \@anscestor_arr;
}

my %parent_hash;
my %child_hash;
my %unsorted_hash;
while(<IN0>)
{
my @cols=split(/\t/);
for (my $i=0; $i < scalar(@cols); $i++)
{
    $cols[$i]= trim($cols[$i]);
}

my $parent=$cols[0];
my $child=$cols[1];
my $parent_child="$parent\t$child";

$parent_hash{$parent}=1;
$child_hash{$child}=1;
$unsorted_hash{$parent_child}=1;
 }
 close(IN0);

my @lev0_arr;
for (sort keys %child_hash)
{
my $rec=$_;
if (!exists($parent_hash{$rec}))
{
    push (@lev0_arr,$rec);
}
}

for (@lev0_arr)
{
my $child=$_;
my @anscestor_arr=@{connectByPrior($child,\%unsorted_hash)};
push (@anscestor_arr,$child);
print OUT1 join("\t",@anscestor_arr)."\n";
}

Question: The code works fine if input file is not too big. The actual input file contains more than 200k lines and the code is taking too long time to process the output. What improvements / changes do you suggest so that it does not take too long to process?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T06:47:39+00:00Added an answer on May 17, 2026 at 6:47 am

    First thing that comes to mind, although it’s completely unrelated to your actual question, is that you should at least consider making your interface more flexible. If you read from <> and print to STDOUT, you’ll be able to take input from either STDIN or an arbitrarily-sized list of files given on the command line, while output can either be viewed on the console or redirected into a file with only a minor change in calling convention:

    parent_child_generator.pl input1.txt input2.txt input3.txt > output.txt
    

    Another minor point is that the $string=~s/\r?\n$//; in trim is unnecessary[1]. $string=~s/\s+$//; will take care of it:

    $ perl -e 'my $foo = "test\r\n"; print "--$foo--\n"; $foo =~ s/\s+$//; print "--$foo--\n";'
    --test
    --
    --test--
    

    Getting to your performance issue (finally…), the core problem is that you’re calling connectByPrior for every element in @lev0_arr[2] and connectByPrior not only loops over %unsorted_hash[3] each time it’s called, but, within that loop, it calls itself recursively! At a first approximation, that’s somewhere between O(n^2 log n) and O(n^3), depending on the shape of your trees, which is simply horrible. You need to avoid touching every piece of data multiple times for every other piece of data you receive.

    So, how would I go about doing that? My first thought is to use one hash to keep track of my root nodes (all those which have nothing linking to them) and one hash of hashes (HoH) to keep track of all links. As each input line is seen, split it into parent and child, as you’re doing. If the parent does not already have an entry in the link HoH, add it to the root hash. If the child is in the root hash, remove it. If the child is not in the links HoH, add an empty hashref for it (so we’ll know in the future that it’s not a root). Finally, add an entry in the link HoH indicating that the parent links to the child.

    Output is then just a matter of iterating over the root hash (your list of starting points) and, for each node found there, recursively printing that node’s children.

    Like so:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use 5.010;
    
    my %root;
    my %link;
    
    while (<>) {
      my ($parent, $child) = split /\t/, $_, 2;
      next unless defined $parent and defined $child;
      $_ = trim($_) for ($parent, $child);
    
      $root{$parent} = 1 unless exists $link{$parent};
      delete $root{$child};
      $link{$child} ||= {};
      $link{$parent}{$child} = 1;
    }
    
    print_links($_) for sort keys %root;
    
    exit;
    
    sub trim {
      my $string=shift;
      $string=~s/^\s+//;
      $string=~s/\s+$//;
      return $string;
    }
    
    sub print_links {
      my @path = @_;
    
      my %children = %{$link{$path[-1]}};
      if (%children) {
        print_links(@path, $_) for sort keys %children;
      } else {
        say join "\t", @path;
      }
    }
    

    Given your example input, this produces the output:

    ROOT    S1      S2      S5      S3      88
    ROOT    S1      S2      S5      S3      S4      77
    ROOT    S1      S2      S5      S8
    ROOT    S1      S2      S9
    ROOT    S1      S7
    

    Since this version only touches each link once for input and once for output, it should scale more-or-less linearly as the amount of input data increases.

    (Of course, Sinan is right to suggest that you should go to CPAN if you’re really looking to get things done, but I was having fun with it.)

    Edit: Code should test whether $parent and $child are defined, not whether they’re true, per Sinan’s comment.

    [1] You should normally use chomp to remove linebreaks instead of a regex anyhow, but I’m giving you the benefit of the doubt and assuming that you may be processing input containing one style of line breaks in an environment which uses the other style.

    [2] …which contains all the leaf nodes, so it’s going to get pretty big with 200k input lines unless you have extremely narrow, deep trees.

    [3] …which contains every input line, trimmed of extraneous whitespace.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using the project.task model where delegation creates a parent/child link between both. When
I want to create Task in Google Calendar using Google Calendar API.Using C#. Looking
I was recently given a task to create a blog using Ruby on Rails.
If the task is to create a csv file out of some data where
What would it take to create a TASK card board with mappable states like
I have to write program that create process using pipe() . My first task
In a standard setup of Parent Child relation, lets say Project and Task. Where
How to create a Sub task using REST API. I Could not find any
I am trying to figure out how to accomplish the following task: Create a
I got handed the task to create a Windows Phone app and i am

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.