Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7841263
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T16:03:52+00:00 2026-06-02T16:03:52+00:00

I am looking for suggestions on what kind of data-structure to use for extremely

  • 0

I am looking for suggestions on what kind of data-structure to use for extremely large structures in OCaml that scale well.

By scales well, I don’t want stack overflows, or exponential heap growth, assuming there is enough memory. So this pretty much eliminates the standard lib’s List.map function. Speed isn’t so much an issue.

But for starters, let’s assume I’m operating in the realm of 2^10 – 2^100 items.

There are only three “manipulations” I perform on the structure:

(1) a map function on subsets of the structure, which either increases or decreases the structure

(2) scanning the structure

(3) removal of specific pairs of items in the structure that satisfy a particular criterion

Originally I was using regular lists, which is still highly desirable, because the structure is constantly changing. Usually after all manipulations are performed, the structure has at most either doubled in size (or something thereabouts), or reduced to the empty list []. Perhaps the doubling dooms me from the beginning but it is unavoidable.

In any event, around 2^15 — 2^40 items start causing severe problems (probably due to the naive list functions I was using as well). The program uses 100% of the cpu, but almost no memory, and generally after a day or two it stack-overflows.

I would prefer to start using more memory, if possible, in order to continue operating in larger spaces.

Anyway, if anyone has any suggestions it would be much appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T16:03:53+00:00Added an answer on June 2, 2026 at 4:03 pm

    If you have enough space, in theory, to contain all items of your data structure, you should look at data structures that have an efficient memory representation, with as few bookeeping as possible. Dynamic arrays (that you resize exponentially when you need more space) are more efficiently stored than list (that pay a full word to store the tail of each cell), so you’d get roughly twice as much elements for the same memory use.

    If you cannot hold all elements in memory (this is what your number look like), you should go for a more abstract representation. It’s difficult to tell more without more information on what your elements are. But maybe an example of abstract representation would help you devise what you need.

    Imagine that I want to record set of integers. I want to make unions, intersections of those sets, and also some more funky operations such as “get all elements that are multiple”. I want to be able to do that for really large sets (zillions of distinct integers), and then I want to be able to pick one element, any one, in this set I have built. Instead of trying to store lists of integers, or set of integers, or array of booleans, what I can do is store the logical formulas corresponding to the definition of those sets: a set of integers P is characterized by a formula F such that F(n) ⇔ n∈P. I can therefore define a type of predications (conditions):

    type predicate =
      | Segment of int * int   (* n ∈ [a;b] *)
      | Inter of predicate * predicate
      | Union of predicate * predicate
      | Multiple of int  (* n mod a = 0 *)
    

    Storing these formulas requires little memory (proportional to the number of operations I want to apply in total). Building the intersection or the union takes constant time. Then I’ll have some work to do to find an element satisfying the formula; basically I’ll have to reason about what those formulas mean, get a normal form out of them (they are all of the form “the elements of a finite union of interval satisfying some modulo criterions”), and from there extract some element.

    In the general case, when you get a “command” on your data set, such that “add the result of mapping over this subset”, you can always, instead of actually evaluating this command, store this as data – the definition of your structure. The more precisely you can describe those commands (eg. you say “map”, but storing an (elem -> elem) function will not allow you to reason easily on the result, maybe you can formulate that mapping operation as a concrete combination of operations), the more precisely you will be able to work on them at this abstract level, without actually computing the elements.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking for suggestions as well as any benchmarks or observations people have. We
I'm looking for ideas/suggestions on a namespace. I have 3 objects that do the
I am looking out for suggestions regarding development of a data-entry intensive application for
I'm looking for the most Rails-y way to create a table that displays data
I am looking into performance issues of a large C#/.NET 3.5 system that exhibits
I'm working on a Swing based project that will display large amounts of data
I am trying to come up with the best data structure for use in
Looking for suggestions for an efficient way to maintain basic audit fields for entities
I'm looking for suggestions for debugging... If you view this site in Firefox or
I'm just looking for suggestions on the best way to do this... I need

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.