Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6343201
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T20:23:31+00:00 2026-05-24T20:23:31+00:00

I have a small application that process a large quantity of (relatively small) files.

  • 0

I have a small application that process a large quantity of (relatively small) files. It runs sequentially: it loads data from a file, perform operations on it, and move to the next file.
I noticed that during run time, the CPU usage is not 100%, and I guess this is due to the time taken by the I/O operations on the hard drive.

So the idea would be to load the next data in memory in parallel with the processing of the current data, using a separate thread (the data in question would simply be a sequence of int, stored in a vector). This seems a very common problem, but I have a hard time finding a simple, plain C++ example to do that!
And now C++0x is on its way, a simple demo code using the new thread facility, with no external library, would be very nice.

Also, although I know this depends on a lot of things, is it possible to have an educated guess on the benefits (or setbacks) of such an approach, in respect to the size of the data file to load for example? I guess that with large files, the disk I/O operations are very seldom anyway, since the data is already buffered (with fstream(?))

Olivier

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T20:23:32+00:00Added an answer on May 24, 2026 at 8:23 pm

    A toy program on how to use some C++0x threading and synchronization facilities. No idea on what the performance of this (I recommend Matt’s answer), my focus is on clarity and correctness for the sake of making an example.

    The files are read separately, as you requested. They’re not converted to a sequence of int however, as I feel this is more related to processing rather than strict I/O. So the files are dumped into a plain std::string.

    #include <fstream>
    #include <sstream>
    #include <string>
    #include <vector>
    #include <deque>
    #include <future>
    #include <mutex>
    #include <condition_variable>
    
    int
    main()
    {
        // this is shared
        std::mutex mutex;
        std::condition_variable condition;
        bool more_to_process = true;
        std::deque<std::string> to_process;
    
        /* Reading the files is done asynchronously */
        std::vector<std::string> filenames = /* initialize */
        auto process = std::async(std::launch::async, [&](std::vector<std::string> filenames)
        {
            typedef std::lock_guard<std::mutex> lock_type;
            for(auto&& filename: filenames) {
                std::ifstream file(filename);
                if(file) {
                    std::ostringstream stream;
                    stream << file.rdbuf();
                    if(stream) {
                        lock_type lock(mutex);
                        to_process.push_back(stream.str());
                        condition.notify_one();
                    }
                }
            }
            lock_type lock(mutex);
            more_to_process = false;
            condition.notify_one();
        }, std::move(filenames));
    
        /* processing is synchronous */
        for(;;) {
            std::string file;
            {
                std::unique_lock<std::mutex> lock(mutex);
                condition.wait(lock, [&]
                { return !more_to_process || !to_process.empty(); });
    
                if(!more_to_process && to_process.empty())
                    break;
                else if(to_process.empty())
                    continue;
    
                file = std::move(to_process.front());
                to_process.pop_front();
            }
    
            // use file here
        }
    
        process.get();
    }
    

    Some notes:

    • the mutex, condition variable, stop flag and std::string container are all logically related. You may as well replace them with a thread-safe container/channel
    • I use std::async instead of std::thread because it has better exception-safety characteristics
    • there is no error handling to speak of; if a file can’t be read for some reason, it is silently skipped. You have several options: signal that there is no more to process and throw to handle as soon as possible; or use a boost::variant<std::string, std::exception_ptr> to pass the error on to the processing side of things (here the error is passed as an exception but you can use an error_code or anything you fancy). Not an exhaustive list by any means.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a small application that uses a SharePoint list as the data source.
I have a small application that using BackgroundWorker to process the IEnumerator<T> list at
I have a small application that I am building a Chat application into, so
I have a small application that redirects the stdout/in of an another app (usually
I have a small lightweight application that is used as part of a larger
I have a small application which embeds webbrowser controls. In that application I have
I have a small VB.NET application that I'm working on using the full version
I have a small application I am working on that at one point needs
I have a small AJAX application, written in PHP that I did not secure
I have a small command-line application written in C that acts as a wrapper/launcher

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.