Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8605403
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T02:49:18+00:00 2026-06-12T02:49:18+00:00

Python beginner, I have become familiar with reading through a file and doing basic

  • 0

Python beginner, I have become familiar with reading through a file and doing basic operations. However now I want to filter through one file based on another. I want to filter file1 to remove any lines that have a score of less that 100000 in column 3 of file2.
I have a main data file(file1):

7   303 0.207756232686981
16  23  0.208562019758507
6   57  0.208727272727273
7   80  0.209065354884048
11  124 0.209500609013398

and I want to make a new data file identical to this one BUT removing any lines that have a score of less than 100000 based on information from a second file(file2):

chr7    303 292526
chr16   23  169805
chr6    57  62822
chr11   124 320564
chr7    80  300291

The first two columns of both files contain the information to determine if the line refers to the same case in both files. However the second file has the addition of ‘chr’ before each number(this ‘chr’ can be ignored).
All lines in the first file are present in the second file but there are some lines in the second file not in the first that can be ignored.

So looking at the example above the line:

6   57  0.208727272727273

would be removed from the new output because it has a value in the 3rd column of file 2 that is below 100,000 while all other lines in the first file would be included as thy have values over 100000. Also important for the output file to maintain the same line order as file 1.

Any help would be greatly appreciated.
I normally use the python structure of

for line in inputfile:
        line = line.rstrip() 
        fields = line.split("\t")

so an answer building off this structure would be extra great.

Please let me know if the question is unclear.

Solution so far:

#!/usr/bin/env python



f2 = open( '/mnt/genotyping/CT/GreatApes/HKA/callability/callable_sites_per_region_500Kb.txt', 'r')
d2 = {}
print f2
for line in f2:
    line = line.rstrip()
    fields = line.split("\t")
    key = (fields[0].replace('chr', ''), fields[1])
    d2[key] = int(fields[2])





f1 = open( '/mnt/genotyping/CT/GreatApes/HKA/Barcelona_approach/500kb/cov_5/Homo-Gorilla/R_plots/Gorilla_genome_dist_cov5.txt', 'r')
for line in f1:
    line = line.rstrip()
    fields = line.split("\t")
    if 'region' not in line:
        key = (fields[0], fields[1])
        if d2[key] >= 100000:
            print line

Thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T02:49:20+00:00Added an answer on June 12, 2026 at 2:49 am

    I used strings instead of files, but the principle remains the same. 1st, create a dict with keys of file2:

    >>> f2 = """chr7\t303\t292526
    chr16\t23\t169805
    chr6\t57\t62822
    chr11\t124\t320564
    chr7\t80\t300291"""
    >>> d2 = {}
    >>> for line in f2.split('\n'):
        line = line.rstrip()
        fields = line.split("\t")
        key = (fields[0].replace('chr', ''), fields[1])
        d2[key] = int(fields[2])
    
    
    >>> d2
    {('7', '303'): 292526, ('7', '80'): 300291, ('16', '23'): 169805, ('6', '57'): 62822, ('11', '124'): 320564}
    

    Then only print the lines of file1 checking values in d2:

    >>> f1 = """7\t303\t0.207756232686981
    16\t23\t0.208562019758507
    6\t57\t0.208727272727273
    7\t80\t0.209065354884048
    11\t124\t0.209500609013398"""
    >>> for line in f1.split('\n'):
        line = line.rstrip()
        fields = line.split("\t")
        key = (fields[0], fields[1])
        if d2[key] >= 100000:
            print line
    
    
    7   303 0.207756232686981
    16  23  0.208562019758507
    7   80  0.209065354884048
    11  124 0.209500609013398
    >>> 
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Python beginner here, I have a list of lists and want to refer to
Python beginner here. I have a text file that is sorted into columns: fields
Python beginner running 2.7 I want to have a list which is constantly summed
I'm a python beginner, and I want to make a basic google tasks client.
beginner to python here. I have 2 nested lists that I want to merge:
I'm a beginner in Python and have a file i've read in that has
Python beginner here. I am iterating through a text file, column by column. for
I'm a Python beginner and have just started using packages. When you're calling a
A python/django beginner's question: I have a datetime object (drive_date), a time object (start_time)
As part of the last assignment in a beginner python programing class, I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.