Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8061169
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T10:14:40+00:00 2026-06-05T10:14:40+00:00

I have a large(ish – >100K) collection mapping a user identifier (an int) to

  • 0

I have a large(ish – >100K) collection mapping a user identifier (an int) to the count of different products that they’ve bought (also an int.) I need to re-organise the data as efficiently as possible to find how many users have different numbers of products. So for example, how many users have 1 product, how many users have two products etc.

I have acheived this by reversing the original data from a std::map into a std::multimap (where the key and value are simply reversed.) I can then pick out the number of users having N products using count(N) (although I also uniquely stored the values in a set so I could be sure of the exact number of values I was iterating over and their order)

Code looks like this:

// uc is a std::map<int, int> containing the  original
// mapping of user identifier to the count of different
// products that they've bought.
std::set<int> uniqueCounts;
std::multimap<int, int> cu; // This maps count to user.

for ( map<int, int>::const_iterator it = uc.begin();
        it != uc.end();  ++it )
{
    cu.insert( std::pair<int, int>( it->second, it->first ) );
    uniqueCounts.insert( it->second );
}

// Now write this out
for ( std::set<int>::const_iterator it = uniqueCounts.begin();
        it != uniqueCounts.end();  ++it )
{
    std::cout << "==> There are "
            << cu.count( *it ) << " users that have bought "
            << *it << " products(s)" << std::endl;
}

I just can’t help feeling that this is not the most efficient way of doing this. Anyone know of a clever method of doing this?

I’m limited in that I can’t use Boost or C++11 to do this.

Oh, also, in case anyone is wondering, this is neither homework, nor an interview question.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T10:14:44+00:00Added an answer on June 5, 2026 at 10:14 am

    Assuming you know the maximum number of products that a single user could have bought, you might see better performance just using a vector to store the results of the operation. As it is you’re going to need an allocation for pretty much every entry in the original map, which likely isn’t the fastest option.

    It would also cut down on the lookup overhead on a map, gain the benefits of memory locality, and replace the call to count on the multimap (which is not a constant time operation) with a constant time lookup of the vector.

    So you could do something like this:

    std::vector< int > uniqueCounts( MAX_PRODUCTS_PER_USER );
    
    for ( map<int, int>::const_iterator it = uc.begin();
            it != uc.end();  ++it )
    {
        uniqueCounts[ uc.second ]++;
    }
    
    // Now write this out
    for ( int i = 0, std::vector< int >::const_iterator it = uniqueCounts.begin();
            it != uniqueCounts.end();  ++it, ++i )
    {
        std::cout << "==> There are "
                << *it << " users that have bought "
                << i << " products(s)" << std::endl;
    }
    

    Even if you don’t know the maximum number of products, it seems like you could just guess a maximum and adapt this code to increase the size of the vector if required. It’s sure to result in less allocations than your original example anyway.

    All this is assuming that you don’t actually require the user ids after you’ve processed this data of course (and as pointed out in the comments below, that the number of products bought for each user is a relatively small & contiguous set. Otherwise you might be better off using a map in place of a vector – you’ll still avoid calling the multimap::count function, but potentially lose some of the other benefits)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large-ish PHP project (hundreds of files) that I've been brought in
I have a large-ish project that contains one class that reads from a file
I have a large(ish) COM object that works as the back end of my
I have a large-ish file (4-5 GB compressed) of small messages that I wish
I have a calendar, and when the user hovers over a cell, a large-ish
In SQL Server 2005, I have a query that involves a bunch of large-ish
We have a large-ish query here that has several params, and for each one,
I have a large-ish project that I'm working on which uses git as the
I have large video files (~100GB) that are local on my machine. I have
I have a table with a large(ish) amount of rows 500k, MSSQL Server 2008.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.