I am newbie to Perl. I need to parse a tab separated text file.

Question

0

Asked: May 27, 20262026-05-27T06:10:14+00:00 2026-05-27T06:10:14+00:00

I am newbie to Perl. I need to parse a tab separated text file.

0

I am newbie to Perl. I need to parse a tab separated text file. For example:

From name   To name      Timestamp                 Interaction
a             b        Dec  2 06:40:23 IST 2000        comment
c             d        Dec  1 10:40:23 IST 2001          like
e             a        Dec  1 16:03:01 IST 2000         follow
b             c        Dec  2 07:50:29 IST 2002         share
a             c        Dec  2 08:50:29 IST 2001        comment
c             a        Dec 11 12:40:23 IST 2008          like
e             c        Dec  2 07:50:29 IST 2000         like
c             b        Dec 11 12:40:23 IST 2008        follow
b             a        Dec  2 08:50:29 IST 2001        share

After parsing I need to create groups base upon users interaction. In this example

a<->b
b<->a
c<->a
a<->c
b<->c
c<->b

for this we can create one group. and we need to display list of groups.
I need some pointers on how to parse the file and form group?

Edit
Constraint-> at least 3 user required for creating group.
Interaction is nothing but some communication is done between two user. It does not matter of which communication

My Approach for solving is

We remove repeated interaction between users . such as “a<>b like “again if “a<>b follow” is present then we remove this row.

Creating 2 dimensional array which store interaction two users i.e

          To Name   a       b        c          d

From Name

   a                X       <>       <>         X
   b                <>      X        <>         X
   c                <>      <>       X          X 
   d                X       <>       X          X

X= Represent no interaction
<>= represent interaction

In this approach we start from first row i.e “a” user check with “b”. if “a” is interact with “b” then we perform reverse of i.e “b” interact with “a”. same steps perform for each column.

But this approach depends on number of users. If 1000 users are present then we have to create 1000 X 1000 matrix. IS there any alternative to solve this

I have added sample input

a   c   Dec  2 06:40:23 IST 2000    comment
f   g   Dec  2 06:40:23 IST 2009    like
c   a   Dec  2 06:40:23 IST 2009    like
g   h   Dec  2 06:40:23 IST 2008    like
a   d   Dec  2 06:40:23 IST 2008    like
r   t   Dec  2 06:40:23 IST 2007    share
d   a   Dec  2 06:40:23 IST 2007    share
t   u   Dec  2 06:40:23 IST 2006    follow
a   e   Dec  2 06:40:23 IST 2006    follow
k   l   Dec  2 06:40:23 IST 2009    like
e   a   Dec  2 06:40:23 IST 2009    like
j   k   Dec  2 06:40:23 IST 2003    like
c   d   Dec  2 06:40:23 IST 2003    like
l   j   Dec  2 06:40:23 IST 2002    like
d   c   Dec  2 06:40:23 IST 2002    like
m   n   Dec  2 06:40:23 IST 2005    like
c   e   Dec  2 06:40:23 IST 2005    like
m   l   Dec  2 06:40:23 IST 2011    like
e   c   Dec  2 06:40:23 IST 2011    like
h   j   Dec  2 06:40:23 IST 2010    like
d   e   Dec  2 06:40:23 IST 2010    like
o   p   Dec  2 06:40:23 IST 2009    like
e   d   Dec  2 06:40:23 IST 2009    like
p   q   Dec  2 06:40:23 IST 2000    comment
q   p   Dec  2 06:40:23 IST 2009    like
a   p   Dec  2 06:40:23 IST 2008    like    
p   a   Dec  2 06:40:23 IST 2007    share
l   p   Dec  2 06:40:23 IST 2003    like
j   l   Dec  2 06:40:23 IST 2002    like
t   r   Dec  2 06:40:23 IST 2000    comment
r   h   Dec  2 06:40:23 IST 2009    like
j   f   Dec  2 06:40:23 IST 2008    like    
g   d   Dec  2 06:40:23 IST 2007    share
w   q   Dec  2 06:40:23 IST 2003    like
o   y   Dec  2 06:40:23 IST 2002    like
x   y   Dec  2 06:40:23 IST 2000    comment
y   x   Dec  2 06:40:23 IST 2009    like
x   z   Dec  2 06:40:23 IST 2008    like    
z   x   Dec  2 06:40:23 IST 2007    share
y   z   Dec  2 06:40:23 IST 2003    like
z   y   Dec  2 06:40:23 IST 2002    like

Output should be:

(a,c, d, e)
(x,y,z)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T06:10:15+00:00

Parsing is easy. Just a split /\t/ might be enough. However, Text::xSV or Text::CSV might be better.

For the connections, you can use the Graph module. To be able to use that module effectively, you need to understand at least the basics of graph theory.

Note that a strongly connected component is defined as:

A directed graph is called strongly connected if there is a path from each vertex in the graph to every other vertex. In particular, this means paths in each direction; a path from a to b and also a path from b to a.

The strongly connected components of a directed graph G are its maximal strongly connected subgraphs.

However, note that if you have a <-> b and b <-> c, a, b, and c will form a strongly connected component meaning that is a weaker requirement than all members of a group interacted with each other in both directions.

We can still use this to reduce the search space. Once you have candidate groups, you can then check each to see if it fits your definition of a group. If a candidate group does not meet your requirements, then you can check all subsets with one fewer members. If you don’t find any groups among those, you can then look at all subsets with two fewer members and so on until you hit the minimum group size limit.

The script below uses this idea. However, it very likely won’t scale. I strongly suspect one might be able to put together some SQL magic but my mind is far too limited for that.

#!/usr/bin/env perl

use strict;
use warnings;

use Graph;
use Algorithm::ChooseSubsets;

use constant MIN_SIZE => 3;

my $interactions = Graph->new(
    directed => 1,
);

while (my $interaction = <DATA>) {
    last unless $interaction =~ /\S/;
    my ($from, $to) = split ' ', $interaction, 3;

    $interactions->add_edge($from, $to);
}

my @groups = map {
    is_group($interactions, $_) ? $_
                                : check_subsets($interactions, $_)
} grep @$_ >= MIN_SIZE, $interactions->strongly_connected_components;


print "Groups: \n";
print "[ @$_ ]\n" for @groups;

sub check_subsets {
    my ($graph, $candidate) = @_;

    my @groups;
    for my $size (reverse MIN_SIZE .. (@$candidate - 1)) {
        my $subsets = Algorithm::ChooseSubsets->new(
            set => $candidate,
            size => $size,
        );

        my $groups_found;
        while (my $subset = $subsets->next) {
            if (is_group($interactions, $subset)) {
                ++$groups_found;
                push @groups, $subset;
            }
        }
        last if $groups_found;
    }

    return @groups;
}

sub is_group {
    my ($graph, $candidate) = @_;

    for my $member (@$candidate) {
        for my $other (@$candidate) {
            next if $member eq $other;
            return unless $graph->has_edge($member, $other);
            return unless $graph->has_edge($other, $member);
        }
    }

    return 1;
}

__DATA__
a   c   Dec  2 06:40:23 IST 2000    comment
f   g   Dec  2 06:40:23 IST 2009    like
c   a   Dec  2 06:40:23 IST 2009    like
g   h   Dec  2 06:40:23 IST 2008    like
a   d   Dec  2 06:40:23 IST 2008    like
r   t   Dec  2 06:40:23 IST 2007    share
d   a   Dec  2 06:40:23 IST 2007    share
t   u   Dec  2 06:40:23 IST 2006    follow
a   e   Dec  2 06:40:23 IST 2006    follow
k   l   Dec  2 06:40:23 IST 2009    like
e   a   Dec  2 06:40:23 IST 2009    like
j   k   Dec  2 06:40:23 IST 2003    like
c   d   Dec  2 06:40:23 IST 2003    like
l   j   Dec  2 06:40:23 IST 2002    like
d   c   Dec  2 06:40:23 IST 2002    like
m   n   Dec  2 06:40:23 IST 2005    like
c   e   Dec  2 06:40:23 IST 2005    like
m   l   Dec  2 06:40:23 IST 2011    like
e   c   Dec  2 06:40:23 IST 2011    like
h   j   Dec  2 06:40:23 IST 2010    like
d   e   Dec  2 06:40:23 IST 2010    like
o   p   Dec  2 06:40:23 IST 2009    like
e   d   Dec  2 06:40:23 IST 2009    like
p   q   Dec  2 06:40:23 IST 2000    comment
q   p   Dec  2 06:40:23 IST 2009    like
a   p   Dec  2 06:40:23 IST 2008    like
p   a   Dec  2 06:40:23 IST 2007    share
l   p   Dec  2 06:40:23 IST 2003    like
j   l   Dec  2 06:40:23 IST 2002    like
t   r   Dec  2 06:40:23 IST 2000    comment
r   h   Dec  2 06:40:23 IST 2009    like
j   f   Dec  2 06:40:23 IST 2008    like
g   d   Dec  2 06:40:23 IST 2007    share
w   q   Dec  2 06:40:23 IST 2003    like
o   y   Dec  2 06:40:23 IST 2002    like
x   y   Dec  2 06:40:23 IST 2000    comment
y   x   Dec  2 06:40:23 IST 2009    like
x   z   Dec  2 06:40:23 IST 2008    like
z   x   Dec  2 06:40:23 IST 2007    share
y   z   Dec  2 06:40:23 IST 2003    like
z   y   Dec  2 06:40:23 IST 2002    like

Output:

Groups:
[ y z x ]
[ e d a c ]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am newbie to Perl. I need to parse a tab separated text file.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply