Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 306897
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T07:30:33+00:00 2026-05-12T07:30:33+00:00

I have created an application that does the following: Make some calculations, write calculated

  • 0

I have created an application that does the following:

  1. Make some calculations, write calculated data to a file – repeat for 500,000 times (over all, write 500,000 files one after the other) – repeat 2 more times (over all, 1.5 mil files were written).
  2. Read data from a file, make some intense calculations with the data from the file – repeat for 1,500,000 iterations (iterate over all the files written in step 1.)
  3. Repeat step 2 for 200 iterations.

Each file is ~212k, so over all i have ~300Gb of data. It looks like the entire process takes ~40 days on a Core 2 Duo CPU with 2.8 Ghz.

My problem is (as you can probably guess) is the time it takes to complete the entire process. All the calculations are serial (each calculation is dependent on the one before), so i can’t parallel this process to different CPUs or PCs. I’m trying to think how to make the process more efficient and I’m pretty sure the most of the overhead goes to file system access (duh…). Every time i access a file i open a handle to it and then close it once i finish reading the data.

One of my ideas to improve the run time was to use one big file of 300Gb (or several big files of 50Gb each), and then I would only use one open file handle and simply seek to each relevant data and read it, but I’m not what is the overhead of opening and closing file handles. can someone shed some light on this?

Another idea i had was to try and group the files to bigger ~100Mb files and then i would read 100Mb each time instead of many 212k reads, but this is much more complicated to implement than the idea above.

Anyway, if anyone can give me some advice on this or have any idea how to improve the run time i would appreciate it!

Thanks.

Profiler update:

I ran a profiler on the process, it looks like the calculations take 62% of runtime and the file read takes 34%. Meaning that even if i miraculously cut file i/o costs by a factor of 34, I’m still left with 24 days, which is quite an improvement, but still a long time 🙂

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T07:30:33+00:00Added an answer on May 12, 2026 at 7:30 am

    Opening a file handle isn’t probable to be the bottleneck; actual disk IO is. If you can parallelize disk access (by e.g. using multiple disks, faster disks, a RAM disk, …) you may benefit way more. Also, be sure to have IO not block the application: read from disk, and process while waiting for IO. E.g. with a reader and a processor thread.

    Another thing: if the next step depends on the current calculation, why go through the effort of saving it to disk? Maybe with another view on the process’ dependencies you can rework the data flow and get rid of a lot of IO.

    Oh yes, and measure it 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have created an application that writes some data to the root folder of
I have exceptions created for every condition that my application does not expect. UserNameNotValidException
I have created an application that runs in the taskbar. When a user clicks
I have created a console application that calls a method on a webservice. I
I have created a timeclock application in C# that connects to a web service
We have created a web application, using ASP.NET, that allows users to upload documents
I have an application that has created a number of custom event log sources
I have a C application that I've created in VS2008. I am creating a
I have created a VB.Net lending application for a cooperative that caters to widows.
I have an existing application that has some parts of formatted text-blocks (standard formats

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.