Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6004841
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T01:19:10+00:00 2026-05-23T01:19:10+00:00

I am trying to run a program that analyzes a bunch of text files

  • 0

I am trying to run a program that analyzes a bunch of text files containing numbers. The total size of the text files is ~12 MB, and I take 1,000 doubles from each of 360 text files and puts them into a vector. My problem is that I get about halfway through the list of text files and then my computer slows down until it isn’t processing any more files. The program is not infinite looping, but I think I have a problem with using too much memory. Is there a better way to store this data that won’t use as much memory?

Other possibly relevant system information:

Running Linux

8 GB memory

Cern ROOT framework installed (I don’t know how to reduce my memory footprint with this though)

Intel Xeon Quad Core Processor

If you need other information, I will update this list

EDIT: I ran top, and my program uses more memory, and once it got above 80% i killed it. There’s a lot of code, so I’ll pick out the bits where memory is being allocated and such to share.
EDIT 2: My code:

void FileAnalysis::doWork(std::string opath, std::string oName)
{
//sets the ouput filepath and the name of the file to contain the results
outpath = opath;
outname = oName;
//Reads the data source and writes it to a text file before pushing the filenames into a vector
setInput();
//Goes through the files queue and analyzes each file
while(!files.empty())
{
    //Puts all of the data points from the next file onto the points vector then deletes the file from the files queue
    readNext();
    //Places all of the min or max points into their respective vectors
    analyze();
    //Calculates the averages and the offset and pushes those into their respective vectors
    calcAvg();
}
makeGraph();
}

//Creates the vector of files to be read
void FileAnalysis::setInput()
{
string sysCall = "", filepath="", temp;
filepath = outpath+"filenames.txt";
sysCall = "ls "+dataFolder+" > "+filepath;
system(sysCall.c_str());
ifstream allfiles(filepath.c_str());
while (!allfiles.eof())
{
    getline(allfiles, temp);
    files.push(temp);
}
}
//Places the data from the next filename into the files vector, then deletes the filename from the vector
void FileAnalysis::readNext()
{
cout<<"Reading from "<<dataFolder<<files.front()<<endl;
ifstream curfile((dataFolder+files.front()).c_str());
string temp, temptodouble;
double tempval;
getline(curfile, temp);
while (!curfile.eof())
{

    if (temp.size()>0)
    {
        unsigned long pos = temp.find_first_of("\t");
        temptodouble = temp.substr(pos, pos);
        tempval = atof(temptodouble.c_str());
        points.push_back(tempval);
    }
    getline(curfile, temp);
}
setTime();
files.pop();
}
//Sets the maxpoints and minpoints vectors from the points vector and adds the vectors to the allmax and allmin vectors
void FileAnalysis::analyze()
{
for (unsigned int i = 1; i<points.size()-1; i++)
{
    if (points[i]>points[i-1]&&points[i]>points[i+1])
    {
        maxpoints.push_back(points[i]);
    }
    if (points[i]<points[i-1]&&points[i]<points[i+1])
    {
        minpoints.push_back(points[i]);
    }
}
allmax.push_back(maxpoints);
allmin.push_back(minpoints);
}
//Calculates the average max and min points from the maxpoints and minpoints vector and adds those averages to the avgmax and avgmin vectors, and adds the offset to the offset vector
void FileAnalysis::calcAvg()
{
double maxtotal = 0, mintotal = 0;
for (unsigned int i = 0; i<maxpoints.size(); i++)
{
    maxtotal+=maxpoints[i];
}
for (unsigned int i = 0; i<minpoints.size(); i++)
{
    mintotal+=minpoints[i];
}
avgmax.push_back(maxtotal/maxpoints.size());
avgmin.push_back(mintotal/minpoints.size());
offset.push_back((maxtotal+mintotal)/2);

}

EDIT 3: I added in the code to reserve vector space and I added code to close the files, but my memory still gets filled to 96% before the program stops…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T01:19:11+00:00Added an answer on May 23, 2026 at 1:19 am

    This could be optimized endlessly, but my immediate reaction would be to use a container other than vector. Remember that storage for a vector is allocated serially in memory, which means adding additional elements causes a reallocation of the entire vector if there isn’t enough current space to hold the new elements.

    Try a container optimized for constant insertions, such as a queue or list.

    Alternatively, if vector is required, you could try allocating the expected memory footprint up-front to avoid continuous reallocation. See vector.reserve(): Vector. Note that the reserved capacity is in terms of elements, not bytes.

    int numberOfItems = 1000;
    int numberOfFiles = 360;
    
    size_type totalExpectedSize = (numberOfItems) * (numberOfFiles);
    myVector.reserve( totalExpectedSize );
    

    ———- EDIT FOLLOWING CODE POST ———-

    My immediate concern would be the following logic in analyze():

    for (unsigned int i = 1; i<points.size()-1; i++) 
    {     
        if (points[i]>points[i-1]&&points[i]>points[i+1])     
        {         
            maxpoints.push_back(points[i]);     
        }     
        if (points[i]<points[i-1]&&points[i]<points[i+1])     
        {         
            minpoints.push_back(points[i]);     
        } 
    } 
    allmax.push_back(maxpoints); 
    allmin.push_back(minpoints); 
    

    Specifically, my concern is the allmax and allmin containers, onto which you are pushing copies of the maxpoints and minpoints containers. The maxpoints and minpoints containers themselves can grow quite large with this logic, depending on the datasets.

    You’re incurring the cost of container copies several times. Is it really necessary to copy the minpoints/maxpoints containers into allmax/allmin? Without knowing a bit more, it’s hard to optimize your storage design.

    I don’t see anywhere that minpoints and maxpoints are actually emptied, which means that over time they can grow very large, and their corresponding copies to the allmin/allmax containers will grow very large. Are minpoints/maxpoints supposed to represent the min/max points for just one file?

    As an example, let’s look at a simplified minpoints and allmin scenario (but keep in mind that this applies to max just as well, and both are on a larger scale than shown here). This is, obviously, a dataset engineered to show my point:

    File 1: 2 1 2 1 2 1 2 1 2 1 2
    minpoints: [1 1 1 1 1]
    allmin:    [1 1 1 1 1]
    
    File 2: 3 2 3 2 3 2 3 2 3 2 3
    minpoints: [1 1 1 1 1 2 2 2 2 2]
    allmin:    [1 1 1 1 1 1 1 1 1 1 2 2 2 2 2]
    
    File 3: 4 3 4 3 4 3 4 3 4 3 4
    minpoints: [1 1 1 1 1 2 2 2 2 2 3 3 3 3 3]
    allmin:    [1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3]
    

    There are other optimizations and critiques to be made, but for now I’m limiting this to trying to solve your immediate problem. Can you post the makeGraph() function, as well as the definitions of all containers involved (points, minpoints, maxpoints, allmin, allmax)?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've been trying to run a program that will invert the order of a
I'm trying to reverse-engineer a program that does some basic parsing: text in, text
I'm trying to run a Mathematica program (that I didn't write) locally. I have
I am trying to run a test program that allows a user to click
I have a program that I'm trying to distribute with py2app. When I run
I have a C# program that I am trying to run as a scheduled
I am trying to write a simple Ruby program that I will run from
I'm trying to replace the programs that run from my startup directory with a
I am new to Java and am trying to run a program using Eclipse.
Hey everyone, I am trying to run the following program, but am getting a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.