Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 171367
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T12:58:31+00:00 2026-05-11T12:58:31+00:00

I have a 384MB text file with 50 million lines. Each line contains 2

  • 0

I have a 384MB text file with 50 million lines. Each line contains 2 space-separated integers: a key and a value. The file is sorted by key. I need an efficient way of looking up the values of a list of about 200 keys in Python.

My current approach is included below. It takes 30 seconds. There must be more efficient Python foo to get this down to a reasonable efficiency of a couple of seconds at most.

# list contains a sorted list of the keys we need to lookup # there is a sentinel at the end of list to simplify the code # we use pointer to iterate through the list of keys for line in fin:   line = map(int, line.split())   while line[0] == list[pointer].key:     list[pointer].value = line[1]     pointer += 1   while line[0] > list[pointer].key:     pointer += 1   if pointer >= len(list) - 1:     break # end of list; -1 is due to sentinel 

Coded binary search + seek solution (thanks kigurai!):

entries = 24935502 # number of entries width   = 18       # fixed width of an entry in the file padded with spaces                    # at the end of each line for i, search in enumerate(list): # list contains the list of search keys   left, right = 0, entries-1    key = None   while key != search and left <= right:     mid = (left + right) / 2     fin.seek(mid * width)     key, value = map(int, fin.readline().split())     if search > key:       left = mid + 1     else:       right = mid - 1   if key != search:     value = None # for when search key is not found   search.result = value # store the result of the search 
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T12:58:32+00:00Added an answer on May 11, 2026 at 12:58 pm

    If you only need 200 of 50 million lines, then reading all of it into memory is a waste. I would sort the list of search keys and then apply binary search to the file using seek() or something similar. This way you would not read the entire file to memory which I think should speed things up.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Have a form here with bunch of input text fields and a file upload
I have to upload a pdf file with 34MB. My site is in Linux
Have following listener for keyboard ArrowDown event(it's key code is 40 ): window.onload =
Have the following code: String s= v.request(engine/?key=, P4z72NmBa91&method=load); JSONParser parser = new JSONParser(); Object
Have whats probably a simple problem with using parent in if and each statements
Have a text box which get data for price. If someone enter something like
Have an app that can use tts to read text messages. It can also
have written this little class, which generates a UUID every time an object of
Have a procedure which looks like Procedure TestProc(TVar1, TVar2 : variant); Begin TVar1 :=
Have done quite a bit of searching for a guide (of any substance) for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.