Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7644807
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T09:46:44+00:00 2026-05-31T09:46:44+00:00

I cannot understand how GroupBy() appears to perform faster for a multi pass ResultSelector

  • 0

I cannot understand how GroupBy() appears to perform faster for a multi pass ResultSelector than for a single pass version.

Given this class:

    public class DummyItem
    {
        public string Category { get; set; }
        public decimal V1 { get; set; }
        public decimal V2 { get; set; }
    }

I create an array with 100,000 entries with some random data and then iterate the following query:

APPROACH 1: Multiple passes for category totals

var q = randomData.GroupBy(
   x => x.Category,
   (k, l) => new DummyItem
   {
      Category = k,
      V1 = l.Sum(x => x.V1), // Iterate the items for this category
      V2 = l.Sum(x => x.V2), // Iterate them again
    }
);

It appears to be double handling the inner enumerable where it sums V1 and V2 for each category.

So I put the following alternative together, presuming that this would provide better performance by calculating category totals in a single pass.

APPROACH 2: Single pass for category totals

var q = randomData.GroupBy(
    x => x.Category, 
    (k, l) => l.Aggregate( // Iterate the inner list once per category
            new decimal[2], 
            (t,d) => 
            {
                t[0] += d.V1;
                t[1] += d.V2;
                return t;
            },
            t => new DummyItem{ Category = k, V1=t[0], V2=t[1] }
    )
);

Fairly typical results:

'Multiple pass': iterations=5 average=2,961 ms each
'Single pass': iterations=5 average=5,146 ms each

Incredibly, Approach 2 takes up to twice as long as Approach 1. I have run numerous benchmarks varying the number of V* properties, the number of distinct categories and other factors. While the magnitude of the performance difference varies, Approach 2 is always substantially slower than Approach 1.

Am I missing something fundamental here? How can Approach 1 be faster than approach 2?

(I sense a facepalm coming…)


* UPDATE *

After @Jirka’s answer I thought it would be worth removing GroupBy() from the picture to see if simple aggregations on a large list performed as expected. The task was simply to compute the totals for the two decimal variables on the same list of 100,000 random rows.

The results continued the surprises:

SUM: ForEach

decimal t1 = 0M;
decimal t2 = 0M;
foreach(var item in randomData)
{
    t1 += item.V1;
    t2 += item.V2;
}

The baseline. I believe the fastest way of getting the required output.

SUM: Multipass

x = randomData.Sum(x => x.V1);
y = randomData.Sum(x => x.V2);

SUM: Singlepass

var result = randomData.Aggregate(new DummyItem(), (t, x) => 
{ 
     t.V1 += x.V1; 
     t.V2 += x.V2; 
     return t; 
});

The results were as follows:

'SUM: ForEach': iterations=10 average=1,793 ms each
'SUM: Multipass': iterations=10 average=2,030 ms each
'SUM: Singlepass': iterations=10 average=5,714 ms each

Surprisingly it reveals the issue has nothing to do with GroupBy. The behavior is consistent with data aggregation generally. My assumption that it is better to do data aggregation in a single pass is simply wrong (probably a hangover from my db roots).

(facepalm)

As @Jirka has pointed out the in-lining apparently occuring for the multipass approach, means it is only marginally slower than the baseline ‘ForEach’. My naive attempt to optimise to a single-pass, ran almost 3 times slower!

It appears that when dealing with in-memory lists, whatever it is you wish to do with the items in the list is likely to be a far bigger factor in performance, than the iteration overhead.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T09:46:46+00:00Added an answer on May 31, 2026 at 9:46 am

    Aggregate has to create 99,999 activation records (for non-inlineable method calls) in the process. That offsets the advantage of the single pass.

    Think of Count, Sum, Average etc. as optimized special cases of what Aggregate can do in the general case.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I cannot understand how to avoid text overflow like this: My markup: <div class=sidebar>
I cannot understand this piece of code: public class BookCategory { public string CategoryName
I cannot understand how this is possible. Please help!! I have an app with
I cannot understand why the code below is giving me this error in firebug
I cannot understand why this throws undefined reference to `floor' : double curr_time =
I cannot understand why this simple query is not created. I call this method
I have a problem with CSS and HTML I cannot understand why this is
I cannot understand why this piece of code does not compile: namespace A {
I understand what tapply() does in R. However, I cannot parse this description of
I cannot understand why this fails. Does a DOMElement need to be part of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.