Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 944829
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T22:39:29+00:00 2026-05-15T22:39:29+00:00

There are two large text files (Millions of lines) that my program uses. These

  • 0

There are two large text files (Millions of lines) that my program uses. These files are parsed and loaded into hashes so that the data can be accessed quickly. The problem I face is that, currently, the parsing and loading is the slowest part of the program. Below is the code where this is done.

database = extractDatabase(@type).chomp("fasta") + "yml"
revDatabase = extractDatabase(@type + "-r").chomp("fasta.reverse") + "yml"
@proteins = Hash.new
@decoyProteins = Hash.new

File.open(database, "r").each_line do |line|
  parts = line.split(": ")
  @proteins[parts[0]] = parts[1]
end

File.open(revDatabase, "r").each_line do |line|
  parts = line.split(": ")
  @decoyProteins[parts[0]] = parts[1]
end

And the files look like the example below. It started off as a YAML file, but the format was modified to increase parsing speed.

MTMDK: P31946   Q14624  Q14624-2    B5BU24  B7ZKJ8  B7Z545  Q4VY19  B2RMS9  B7Z544  Q4VY20
MTMDKSELVQK: P31946 B5BU24  Q4VY19  Q4VY20
....

I’ve messed around with different ways of setting up the file and parsing them, and so far this is the fastest way, but it’s still awfully slow.

Is there a way to improve the speed of this, or is there a whole other approach I can take?

List of things that don’t work:

  • YAML.
  • Standard Ruby threads.
  • Forking off processes and then retrieving the hash through a pipe.
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T22:39:30+00:00Added an answer on May 15, 2026 at 10:39 pm

    In my usage, reading all or part the file into memory before parsing usually goes faster. If the database sizes are small enough this could be as simple as

    buffer = File.readlines(database)
    buffer.each do |line|
        ...
    end
    

    If they’re too big to fit into memory, it gets more complicated, you have to setup block reads of data followed by parse, or threaded with separate read and parse threads.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have many large (~30 MB a piece) tab-delimited text files with variable-width lines.
I'm tasked with reading a large text file (around 150 MB), parsing it and
I have a very large comma-seperated text-file. I need to display this in a
I have a large text file of records, each delimited by a newline. Each
This particular problem is easy to solve, but I'm not so sure that the
I have a list of strings that are all early modern English words ending
I'm writing a Windows CE application in C++ directly applying the WINAPI. In this
I'm on a quest to reduce my very bloated home page application in development.
I am adjusting my old apps to iPhone 4 using the simulator at the
Just getting started on iPhone dev today and have run through Apple's HelloWorld tutorial:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.