Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 345751
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T11:06:48+00:00 2026-05-12T11:06:48+00:00

We need to read and count different types of messages/run some statistics on a

  • 0

We need to read and count different types of messages/run
some statistics on a 10 GB text file, e.g a FIX engine
log. We use Linux, 32-bit, 4 CPUs, Intel, coding in Perl but
the language doesn’t really matter.

I have found some interesting tips in Tim Bray’s
WideFinder project. However, we’ve found that using memory mapping
is inherently limited by the 32 bit architecture.

We tried using multiple processes, which seems to work
faster if we process the file in parallel using 4 processes
on 4 CPUs. Adding multi-threading slows it down, maybe
because of the cost of context switching. We tried changing
the size of thread pool, but that is still slower than
simple multi-process version.

The memory mapping part is not very stable, sometimes it
takes 80 sec and sometimes 7 sec on a 2 GB file, maybe from
page faults or something related to virtual memory usage.
Anyway, Mmap cannot scale beyond 4 GB on a 32 bit
architecture.

We tried Perl’s IPC::Mmap and Sys::Mmap. Looked
into Map-Reduce as well, but the problem is really I/O
bound, the processing itself is sufficiently fast.

So we decided to try optimize the basic I/O by tuning
buffering size, type, etc.

Can anyone who is aware of an existing project where this
problem was efficiently solved in any language/platform
point to a useful link or suggest a direction?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T11:06:48+00:00Added an answer on May 12, 2026 at 11:06 am

    Most of the time you will be I/O bound not CPU bound, thus just read this file through normal Perl I/O and process it in single thread. Unless you prove that you can do more I/O than your single CPU work, don’t waste your time with anything more. Anyway, you should ask: Why on Earth is this in one huge file? Why on Earth don’t they split it in a reasonable way when they generate it? It would be magnitude more worth work. Then you can put it in separate I/O channels and use more CPU’s (if you don’t use some sort of RAID 0 or NAS or …).

    Measure, don’t assume. Don’t forget to flush caches before each test. Remember that serialized I/O is a magnitude faster than random.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to read char[] (size is COUNT) from text file from OFFSET with
I need to use the Scanner class to read a file and count how
I need to read test.TXT file(tab delimited) into MATLAB. TXT file have form: Datum
I need to read 16 bits from the binary file as std::string or char
I need to read certain statistics from iw_statistics structure, here's the code: struct net_device
I need to read a file line by line, and change a variable accordingly.
I need to read data from XML to a List<>. The XML file contains
I need to read the editurl from one jqgrid for use in a DND
I need to read the size of a file but the server is forcing
I am dealing with some HTML text that I basically have read access to.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.