Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7932335
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T20:59:10+00:00 2026-06-03T20:59:10+00:00

I have two files: file_1 has three columns (Marker( SNP ), Chromosome, and position)

  • 0

I have two files:

  • file_1 has three columns (Marker(SNP), Chromosome, and position)
  • file_2 has three columns (Chromosome, peak_start, and peak_end).

All columns are numeric except for the SNP column.

The files are arranged as shown in the screenshots. file_1 has several hundred SNPs as rows while file_2 has 61 peaks. Each peak is marked by a peak_start and peak_end. There can be any of the 23 chromosomes in either file and file_2 has several peaks per chromosome.

I want to find if the position of the SNP in file_1 falls within the peak_start and peak_end in file_2 for each matching chromosome. If it does, I want to show which SNP falls in which peak (preferably write output to a tab-delimited file).

I would prefer to split the file, and use hashes where the chromosome is the key. I have found only a few questions remotely similar to this, but I could not understand well the suggested solutions.

Here is the example of my code. It is only meant to illustrate my question and so far doesn’t do anything so think of it as “pseudocode”.

#!usr/bin/perl

use strict;
use warnings;

my (%peaks, %X81_05);
my @array;

# Open file or die

unless (open (FIRST_SAMPLE, "X81_05.txt")) {
    die "Could not open X81_05.txt";
}

# Split the tab-delimited file into respective fields

while (<FIRST_SAMPLE>) {

    chomp $_;
    next if (m/Chromosome/); # Skip the header

    @array = split("\t", $_);
    ($chr1, $pos, $sample) = @array;

    $X81_05{'$array[0]'} = (
        'position' =>'$array[1]'
    )
}

close (FIRST_SAMPLE);

# Open file using file handle
unless (open (PEAKS, "peaks.txt")) {
    die "could not open peaks.txt";
}

my ($chr, $peak_start, $peak_end);

while (<PEAKS>) {
    chomp $_;

    next  if (m/Chromosome/); # Skip header
    ($chr, $peak_start, $peak_end) = split(/\t/);
    $peaks{$chr}{'peak_start'} = $peak_start;
    $peaks{$chr}{'peak_end'}  = $peak_end;
}

close (PEAKS);

for my $chr1 (keys %X81_05) {
    my $val = $X81_05{$chr1}{'position'};

    for my $chr (keys %peaks) {
        my $min = $peaks{$chr}{'peak_start'};

        my $max = $peaks{$chr}{'peak_end'};

        if (($val > $min) and ($val < $max)) {
            #print $val, " ", "lies between"," ", $min, " ", "and", " ", $max, "\n";
        }
        else {
                #print $val, " ", "does not lie between"," ", $min, " ", "and", " ", $max, "\n";
        }
    }
}

More awesome code:

  1. https://i.stack.imgur.com/fzwRQ.png
  2. https://i.stack.imgur.com/2ryyI.png
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T20:59:12+00:00Added an answer on June 3, 2026 at 8:59 pm

    The points raised by @David are good; try to incorporate those in your programs. (I have borrowed most of the code from @David’s post.)

    One thing I didn’t understand is that why load both peak values and position in hash, as loading one would suffice. As each chromosome has more than one record, use HoA. My solution is based on that. You might need to change the cols and their positions.

    use strict;
    use warnings;
    
    our $Sep = "\t";
    open (my $peak_fh, "<", "data/file2");
    my %chromosome_hash;
    
    while (my $line = <$peak_fh>) {
        chomp $line;
        next if $line =~ /Chromosome/; #Skip Header
        my ($chromosome) = (split($Sep, $line))[0];
        push @{$chromosome_hash{$chromosome}}, $line; # Store the line(s) indexed by chromo
    }
    close $peak_fh;
    
    open (my $position_fh, "<", "data/file1");
    
    while (my $line = <$position_fh>) {
        chomp $line;
        my ($chromosome, $snp, $position) = split ($Sep, $line);
        next unless exists $chromosome_hash{$chromosome};
    
        foreach my $peak_line (@{$chromosome_hash{$chromosome}}) {
            my ($start,$end) = (split($Sep, $line))[1,2];
    
            if ($position >= $start and $position <= $end) {
                print "MATCH REQUIRED-DETAILS...$line-$peak_line\n";
            }
            else {
                print "NO MATCH REQUIRED-DETAILS...$line-$peak_line\n";
            }
        }
    }
    close $position_fh;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two files. file1 has the data like belowing containing only one column.
I have two files: master/newsletter1/file.html master/newsletter2/file.html newsletter1/file.html has a lot of new changes that
I have a table that has two columns both of them are continuous data.
I have a streets table, which has a combo of two string columns acting
I have two columns in a csv file, one of which has the time
If I have two css files: File 1: .colorme { background-color:Red; } File 2:
Say I have two files where there is one number per line File 1
I have two FASTA files: file1.fasta >foo ATCGGGG >bar CCCCCC file2.fasta >qux ATCGGAAA What
I have two files A - nodes_to_delete and B - nodes_to_keep . Each file
hi guys i have two pipe delimited files,first file contains 1000 records and second

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.