Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8848481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T12:25:58+00:00 2026-06-14T12:25:58+00:00

I have a large, strictly increasing array (10 million integers) of offsets for another,

  • 0

I have a large, strictly increasing array (10 million integers) of offsets for another, larger, data array. No element in data is greater than 50. For example,

unsigned char data[70*1000*1000] = {0,2,1,1,0,2,1,4,2, ...};
unsigned int offsets[10*1000*1000] = {0,1,2,4,6,7,8, ...};

Then I would like to find the count of each element in a series of ranges that are not known until runtime, including only elements whose offsets are included in the offsets array. The endpoints of each range refer to indices of the data array, not to the offsets. For example, the data for the range [1,4] would be:

1 zero
1 one
1 two

The results include only one “one” because, while both data[3] and data[2] are equal to one, 3 is not included in offsets.

I need to compute these binned counts for several hundred ranges, some of which span the entire array. I considered iterating through the data array to store a cumulative sum for each bin and element, but the memory requirements would have been prohibitive. Here is a simple version of my implementation:

for(int i=0; i<range_count; i++){
    unsigned int j=0;
    while(j<range_starts[i]) pi++;
    while(j < 10000000 and data[j]<=range_ends[i]) bins[i][data[offsets[j++]]]++;
}

Is there any more efficient way to compute these counts?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T12:25:59+00:00Added an answer on June 14, 2026 at 12:25 pm

    While Ruben’s answer did improve the time of the counts by about half, it remained too slow for my application. I include my solution here for the curious.

    First, I optimized by setting elements in the data array not indexed by offsets to an unused value (51, for example). This removed the need to track offsets, because I could simply ignore the contents of the 51st bins when reporting results.

    While I mentioned in the answer that storing cumulative counts for each bin and element would require too much memory, I was able to store the cumulative counts for each bin and range endpoint in linear time. Then, for each range, I calculated the occurrences of each element by subtracting the cumulative count for that element at the left endpoint of the range from the count at the right endpoint. Here is what I used:

    struct range{
        unsigned int lowerbound;
        unsigned int upperbound;
        unsigned int bins[52];
    };
    
    struct endpoint{
        int n;
        unsigned int counts[50];
    };
    
    range ranges[N_RANGES];
    endpoint endpoints[N_RANGES*2];
    cumulative_counts[52];
    
    // ... < data manipulation > ... 
    
    endpoint* first_ep = &endpoints[0];
    endpoint* last_ep = &endpoints[N_RANGES*2-1];
    endpoint* next_ep;
    
    for(next_ep=&endpoints[0];next_ep<last_ep;next_ep++){
        unsigned char* i = &data[next_ep->n];
        unsigned char* i_end = &data[(next_ep+1)->n];
        for(int j=0;j<51;j++) next_ep->counts[j] = cumulative_counts[j];
        while(i<i_end) cumulative_counts[*(i++)]++;
    }
    for(int i=0;i<51;i++) last_ep->sums[i] = cumulative_counts[i];
    for(int i=0;i<N_RANGES;i++){
        while(first_ep->n != ranges[i].lowerbound) first_ep++;
        last_ep = first_ep+1;
        while(last_ep->n != ranges[i].upperbound) last_ep++;
        for(int j=0;j<51;j++) tests[i].bins[j] = end_ep->counts[j]-start_ep->counts[j];
        ranges[i].bins[data[last_ep->n]]++;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Interpolating Large Datasets I have a large data set of about 0.5million records representing
I have large data sets (10 Hz data, so 864k points per 24 Hours)
I have large sets of data but somewhere in columns there is missing data
I have large amount of data to be plotted on iPad using a core
i have large data in label field and using FOCUSABLE on it makes the
I have large data files of values on a 2D grid. They are organized
We have large (2M+ row) product data tables with 50+ data fields and multiple
I have large chunks of data, normally at around 2000+ entries, but in this
I have large data nearly about 5,00,000 of record.I have to write it to
I have large string of coordinate data similar to the following: -83.6530304633209,34.8237504877659,-83.65536046332301,34.8248804877671, etc.... I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.