Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8036243
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T02:33:21+00:00 2026-06-05T02:33:21+00:00

I am working on a long running Python program (a part of it is

  • 0

I am working on a long running Python program (a part of it is a Flask API, and the other realtime data fetcher).

Both my long running processes iterate, quite often (the API one might even do so hundreds of times a second) over large data sets (second by second observations of certain economic series, for example 1-5MB worth of data or even more). They also interpolate, compare and do calculations between series etc.

What techniques, for the sake of keeping my processes alive, can I practice when iterating / passing as parameters / processing these large data sets? For instance, should I use the gc module and collect manually?

UPDATE

I am originally a C/C++ developer and would have NO problem (and would even enjoy) writing parts in C++. I simply have 0 experience doing so. How do I get started?

Any advice would be appreciated.
Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T02:33:23+00:00Added an answer on June 5, 2026 at 2:33 am

    Working with large datasets isn’t necessarily going to cause memory complications. As long as you use sound approaches when you view and manipulate your data, you can typically make frugal use of memory.

    There are two concepts you need to consider as you’re building the models that process your data.

    1. What is the smallest element of your data need access to to perform a given calculation? For example, you might have a 300GB text file filled with numbers. If you’re looking to calculate the average of the numbers, read one number at a time to calculate a running average. In this example, the smallest element is a single number in the file, since that is the only element of our data set that we need to consider at any point in time.

    2. How can you model your application such that you access these elements iteratively, one at a time, during that calculation? In our example, instead of reading the entire file at once, we’ll read one number from the file at a time. With this approach, we use a tiny amount of memory, but can process an arbitrarily large data set. Instead of passing a reference to your dataset around in memory, pass a view of your dataset, which knows how to load specific elements from it on demand (which can be freed once worked with). This similar in principle to buffering and is the approach many iterators take (e.g., xrange, open‘s file object, etc.).

    In general, the trick is understanding how to break your problem down into tiny, constant-sized pieces, and then stitching those pieces together one by one to calculate a result. You’ll find these tenants of data processing go hand-in-hand with building applications that support massive parallelism, as well.

    Looking towards gc is jumping the gun. You’ve provided only a high-level description of what you are working on, but from what you’ve said, there is no reason you need to complicate things by poking around in memory management yet. Depending on the type of analytics you are doing, consider investigating numpy which aims to lighten the burden of heavy statistical analysis.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am working on a Python backend web server that grabs realtime data from
I'm working with a long running parfor loop in matlab. parfor iter=1:1000 chunk_of_work(iter); end
I am working on displaying a progress bar using JQuery during a long running
I'm working with very long time series -- hundreds of millions of data points
I've got a long running python script that I want to be able to
I'm working on a Sidekiq worker that's sole purpose is to make several long-running
I'm getting started with using python's mrjob to convert some of my long running
I'm working with the Twitter Gem and I've created a long running ruby task.
Working with BackGroundWorker in my WPF 3.5 application to make a long running process
I am working on application which processes large amount of text data gathering statistics

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.