Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8333579
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T03:04:57+00:00 2026-06-09T03:04:57+00:00

How would I go about applying an aggregating function (such as sum() or max()

  • 0

How would I go about applying an aggregating function (such as “sum()” or “max()“) to bins in a vector.

That is if I have:

  1. a vector of values x of length N
  2. a vector of bin tags b of length N

such that b indicates to what bin each value in x belongs.
for every possible value in b a I want to apply the aggregating function “func()” on all the values of x that belong to that bin.

>> x = [1,2,3,4,5,6]
>> b = ["a","b","a","a","c","c"]    

the output should be 2 vectors (say the aggregating function is the product function):

>>(labels, y) = apply_to_bins(values = x, bins = b, func = prod)

labels = ["a","b","c"]
y = [12, 2, 30]

I want to do this as elegantly as possible in numpy (or just python), since obviously I could just “for loop” over it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T03:04:58+00:00Added an answer on June 9, 2026 at 3:04 am

    With pandas groupby this would be

    import pandas as pd
    
    def with_pandas_groupby(func, x, b):
        grouped = pd.Series(x).groupby(b)
        return grouped.agg(func)
    

    Using the example of the OP:

    >>> x = [1,2,3,4,5,6]
    >>> b = ["a","b","a","a","c","c"]
    >>> with_pandas_groupby(np.prod, x, b)
    a    12
    b     2
    c    30
    

    I was just interessted in the speed and so I compared with_pandas_groupby with some functions given in the answer of senderle.

    • apply_to_bins_groupby

       3 levels,      100 values: 175 us per loop
       3 levels,     1000 values: 1.16 ms per loop
       3 levels,  1000000 values: 1.21 s per loop
      
      10 levels,      100 values: 304 us per loop
      10 levels,     1000 values: 1.32 ms per loop
      10 levels,  1000000 values: 1.23 s per loop
      
      26 levels,      100 values: 554 us per loop
      26 levels,     1000 values: 1.59 ms per loop
      26 levels,  1000000 values: 1.27 s per loop
      
    • apply_to_bins3

       3 levels,      100 values: 136 us per loop
       3 levels,     1000 values: 259 us per loop
       3 levels,  1000000 values: 205 ms per loop
      
      10 levels,      100 values: 297 us per loop
      10 levels,     1000 values: 447 us per loop
      10 levels,  1000000 values: 262 ms per loop
      
      26 levels,      100 values: 617 us per loop
      26 levels,     1000 values: 795 us per loop
      26 levels,  1000000 values: 299 ms per loop
      
    • with_pandas_groupby

       3 levels,      100 values: 365 us per loop
       3 levels,     1000 values: 443 us per loop
       3 levels,  1000000 values: 89.4 ms per loop
      
      10 levels,      100 values: 369 us per loop
      10 levels,     1000 values: 453 us per loop
      10 levels,  1000000 values: 88.8 ms per loop
      
      26 levels,      100 values: 382 us per loop
      26 levels,     1000 values: 466 us per loop
      26 levels,  1000000 values: 89.9 ms per loop
      

    So pandas is the fastest for large item size. Further more the number of levels (bins) has no big influence on computation time.
    (Note that the time is calculated starting from numpy arrays and the time to create the pandas.Series is included)

    I generated the data with:

    def gen_data(levels, size):
        choices = 'abcdefghijklmnopqrstuvwxyz'
        levels = np.asarray([l for l in choices[:nlevels]])
        index = np.random.random_integers(0, levels.size - 1, size)
        b = levels[index]
        x = np.arange(1, size + 1)
        return x, b
    

    And then run the benchmark in ipython like this:

    In [174]: for nlevels in (3, 10, 26):
       .....:     for size in (100, 1000, 10e5):
       .....:         x, b = gen_data(nlevels, size)
       .....:         print '%2d levels, ' % nlevels, '%7d values:' % size,
       .....:         %timeit function_to_time(np.prod, x, b)
       .....:     print
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Using STL in C++, how would I go about applying a function to each
I would like to know how I would go about altering the HTML that
Anyone have any idea how I would go about converting a timestamp in milliseconds
I am wondering how I would go about performing a certain function if a
How would I go about applying each element in a list to each argument
How would you go about applying a background color with CSS only, while only
Q: How would I go about using / applying the row number of each
I would like 'about' to route to 'abouts/1' I tried this: match 'about' =>
Any ideas how I would go about writing a javascript method to insert an
Just a quick question about how you would go about implementing this. I want

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.