Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8986091
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T21:28:08+00:00 2026-06-15T21:28:08+00:00

Another pandas question. Reading Wes Mckinney’s excellent book about Data Analysis and Pandas, I

  • 0

Another pandas question.

Reading Wes Mckinney’s excellent book about Data Analysis and Pandas, I encountered the following thing that I thought should work:

Suppose I have some info about tips.

In [119]:

tips.head()
Out[119]:
total_bill  tip      sex     smoker    day   time    size  tip_pct
0    16.99   1.01    Female  False   Sun     Dinner  2   0.059447
1    10.34   1.66    Male    False   Sun     Dinner  3   0.160542
2    21.01   3.50    Male    False   Sun     Dinner  3   0.166587
3    23.68   3.31    Male    False   Sun     Dinner  2   0.139780
4    24.59   3.61    Female  False   Sun     Dinner  4   0.146808

and I want to know the five largest tips in relation to the total bill, that is, tip_pct for smokers and non-smokers separately. So this works:

def top(df, n=5, column='tip_pct'): 
    return df.sort_index(by=column)[-n:]

In [101]:

tips.groupby('smoker').apply(top)
Out[101]:
           total_bill   tip sex smoker  day time    size    tip_pct
smoker                                  
False   88   24.71   5.85    Male    False   Thur    Lunch   2   0.236746
185  20.69   5.00    Male    False   Sun     Dinner  5   0.241663
51   10.29   2.60    Female  False   Sun     Dinner  2   0.252672
149  7.51    2.00    Male    False   Thur    Lunch   2   0.266312
232  11.61   3.39    Male    False   Sat     Dinner  2   0.291990

True    109  14.31   4.00    Female  True    Sat     Dinner  2   0.279525
183  23.17   6.50    Male    True    Sun     Dinner  4   0.280535
67   3.07    1.00    Female  True    Sat     Dinner  1   0.325733
178  9.60    4.00    Female  True    Sun     Dinner  2   0.416667
172  7.25    5.15    Male    True    Sun     Dinner  2   0.710345

Good enough, but then I wanted to use pandas’ transform to do the same like this:

def top_all(df):
    return df.sort_index(by='tip_pct')

tips.groupby('smoker').transform(top_all)

but instead I get this:

TypeError: Transform function invalid for data types

Why? I know that transform requires to return an array of the same dimensions that it accepts as input, so I thought I’d be complying with that requirement just sorting both slices (smokers and non-smokers) of the original DataFrame without changing their respective dimensions. Can anyone explain why it failed?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T21:28:10+00:00Added an answer on June 15, 2026 at 9:28 pm

    transform is not that well documented, but it seems that the way it works is that what the transform function is passed is not the entire group as a dataframe, but a single column of a single group. I don’t think it’s really meant for what you’re trying to do, and your solution with apply is fine.

    So suppose tips.groupby('smoker').transform(func). There will be two groups, call them group1 and group2. The transform does not call func(group1) and func(group2). Instead, it calls func(group1['total_bill']), then func(group1['tip']), etc., and then func(group2['total_bill']), func(group2['tip']). Here’s an example:

    >>> print d
       A  B  C
    0 -2  5  4
    1  1 -1  2
    2  0  2  1
    3 -3  1  2
    4  5  0  2
    >>> def foo(df):
    ...     print ">>>"
    ...     print df
    ...     print "<<<"
    ...     return df
    >>> print d.groupby('C').transform(foo)
    >>>
    2    0
    Name: A
    <<<
    >>>
    2    2
    Name: B
    <<<
    >>>
    1    1
    3   -3
    4    5
    Name: A
    <<<
    >>>
    1   -1
    3    1
    4    0
    Name: B
    # etc.
    

    You can see that foo is first called with just the A column of the C=1 group of the original data frame, then the B column of that group, then the A column of the C=2 group, etc.

    This makes sense if you think about what transform is for. It’s meant for applying transform functions on the groups. But in general, these functions won’t make sense when applied to the entire group, only to a given column. For instance, the example in the pandas docs is about z-standardizing using transform. If you have a DataFrame with columns for age and weight, it wouldn’t make sense to z-standardize with respect to the overall mean of both these variables. It doesn’t even mean anything to take the overall mean of a bunch of numbers, some of which are ages and some of which are weights. You have to z-standardize the age with respect to the mean age and the weight with respect to the mean weight, which means you want to transform separately for each column.

    So basically, you don’t need to use transform here. apply is the appropriate function here, because apply really does operate on each group as a single DataFrame, while transform operates on each column of each group.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Another git question... I am in the following situation: A1 ---- B1 ---- C1
Another synchronization question...I hope you guys don't get annoyed ;) Assume the following scenario:
Another newbie question. I had a question about creating a tooltip on focus ,
Another question about UDID ... UDID is a unique identifier for the phone, but
Another SCJP question. I think the output is K=7, but the book's answer is
Another poster asked about preferred syntax for infinite loops . A follow-up question: Why
Another question asked about determining odd/evenness in C, and the idiomatic (x & 1)
Another day , another question. My service layer has the following method public MatchViewData
another in my beginnerish series of questions about VBA. I am in the process
another newbie-question regarding rails and bootstrap. I am using something like this: <div class=tabbable

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.