Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9081647
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T20:21:53+00:00 2026-06-16T20:21:53+00:00

I am stuck with a rather unique problem. I have 2 files which I

  • 0

I am stuck with a rather unique problem. I have 2 files which I am reading. A small version of those 2 files look like the following:

File1

chr1    9873    12227   11873   2354    +   NR_046018   DDX11L1
chr1    760970  763155  762970  2185    +   NR_047520   LOC643837

File2

chr1    9871    0   
chr1    9872    1
chr1    9873    1
chr1    9874    2
chr1    9875    1
chr1    9876    3
chr1    9877    3
chr1    760970  1
chr1    760971  1
chr1    760972  1
chr1    760973  2
chr1    760974  3
chr1    760975  3
chr1    760976  4
chr1    760977  5
chr1    760978  6
chr1    760979  7
chr1    760980  6
chr1    760981  7
chr1    760982  8
chr1    760983  9
chr1    760984  10
chr1    760985  11
chr1    760986  12
chr1    760987  10
chr1    760988  9
chr1    760989  6

Problem

  1. From 1st file, I have to pick up the 2nd element from each row and take it as $start. An ending position is determined by $end = $start + 10.

  2. Based on $start, I now have to take the 2nd file, and look at 2nd element of each row. Once $start is found, I need to sum the next 5 corresponding values of 3rd element in groups of 5, upto $end.

So as $end is $start + 10 and I am summing in groups of 5, 2 summation values would be obtained.


In case some values upto $end is not present in the 2nd element of 2nd file, the code should not stop, it should continue to perform summation and display sum as 0 (in case a continuous group of 5 elements is not present).

Taking the example of the files here, from File1, 2nd element = 9873, which is assigned to $start. Thus $end would be $start+10 ie 9883.

From File2, once $start is found in the 2nd element of the row, the 3rd element for the next 5 rows have to be summed as 1 group, and the next 5 values summed as 2nd group till $end.

Note

Here as can be seen in File2, $end i.e 9883 is not present. Hence sum of values from 9879 to 9883 must be zero. It must not sum the values of 760970 onwards…

Desired Output

chr1    9873    12227   11873   2354    +   NR_046018   DDX11L1      10   0
chr1    760970  763155  762970  2185    +   NR_047520   LOC643837    8   25

Points to Note

  1. While dealing with actual files, $end = $start+10,000(instead of $end = $start+10)
  2. Also,in the same note, groups of 25 values will be summed(instead of 5), obtaining total 400 values while working with the actual files.
  3. In case there are a range of values which are not present in the 2nd element of $file2, summation should proceed as normal, if a continuous pair of 25 values are absent, 0 should be printed out.
  4. The files contain > 1 million rows each.

Code

The code I’ve written so far manages to do the following :

  1. Read from files.
  2. Assign $start and $end from file1
  3. From file2 , push all 2nd elements into array @c_posn ; all 3rd elements into array @peak.
  4. Check if $start is present in @c_posn

I am not able to figure out how to do the summation part. I had thought of creating a hash, where all 2nd elements of 2nd file go into keys and 3rd elements into values. But the hash is coming unordered. So I created the 2 arrays namely @c_posn for 2nd elements, @peaks for 3rd elements. But now I don’t know how to simultaneously compare the 2 arrays( to ensure values of 760970 don’t get summed)

use 5.012;
use warnings;
use List::Util qw/first/;

my $file1 = 'chr1trialS.out';
my $file2 = 'b1.wig';

open my $fh1,'<',$file1 or die qw /Can't_open_file_$file1/;
open my $fh2,'<',$file2 or die qw /Can't_open_file_$file2/;

my($start, $end);
while(<$fh1>){
    my @val1 = split;
    $start = $val1[1]; #Assign start value
    $end = $start + 10; #Assign end value
    say $start,"->",$end; #Can be commented out
}

my @c_posn;
my @peak;

while(<$fh2>){
    my @val2 = split;   
    push @c_posn,$val2[1]; #Push all 2nd elements 
    push @peak, $val2[2];  #Push all 3rd elements        
}           

if (first { $_ eq $start} @c_posn) { say "I found it! " } #To check if $start is present in @c_posn

say "@c_posn"; #just to check all 2nd elements are obtained
say "@peak"; #just to check all 3rd elements are obtained   

Thank you for taking the time to go through my problem. If any clarifications are needed, please do ask me.
I will be grateful for any and every comment/answer.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T20:21:54+00:00Added an answer on June 16, 2026 at 8:21 pm

    This is straightforward to do if b1.wig is small enough to be read into a hash in memory, taking the keys from column 2 and the values from column 3. Then all that must be done is to access each key in each sequence, using zero if a corresponding hash element is non-existent (and so accessing it returns undef).

    You haven’t said how you want to separate the new totals from the existing data from chr1trialS.out so I have used spaces. Of course this is easy to change if necessary.

    use strict;
    use warnings;
    
    use constant SAMPLE_SIZE => 10;
    use constant CHUNK_SIZE => 5;
    
    my $file1 = 'chr1trialS.out';
    my $file2 = 'b1.wig';
    
    my %data2;
    {
      open my $fh, '<', $file2 or die $!;
    
      while (<$fh>) {
        my ($key, $val) = (split)[1,2];
        $data2{$key} = $val;
      }
    }
    
    open my $fh, '<', $file1 or die $!;
    
    while (<$fh>) {
      chomp;
      my $key = (split)[1];
      my @totals;
      my $n = 0;
      while ($n < SAMPLE_SIZE) {
        push @totals, 0 if $n++ % CHUNK_SIZE == 0;
        $totals[-1] += $data2{$key++} // 0;
      }
      print "$_ @totals\n";
    }
    

    output

    chr1    9873    12227   11873   2354    +   NR_046018   DDX11L1 10 0
    chr1    760970  763155  762970  2185    +   NR_047520   LOC643837 8 25
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am stuck in a rather strange problem with SQL Server 2005, which throws
Evening folks!! Stuck with this rather dull problem. I have deployed my website on
I really have no idea how to approach this. I've been reading for like
I would like to thank contributors in advanced, I am rather stuck on this
I have a MySQL table to hold tags (i.e. like those used here on
I'm rather stuck with a problem I found while attempting to port a package
I'm rather new to coding xslt and have got rather stuck trying to do
I am stuck with rather confusing query. Assume I have a ProductLending table that
I'm stuck in something that I'm sure rather easy I have a field in
What I'm trying to do: I have two files: One is the header.php which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.