Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8354077
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T09:26:48+00:00 2026-06-09T09:26:48+00:00

I would like to partially collapse a DataFrame /matrix and keep the structure intact

  • 0

I would like to partially “collapse” a DataFrame/matrix and keep the structure intact by just summing the condensed values. For example, I have this:

CHROM     POS     GENE     DESC     JOE      FRED   BILLY    SUSAN    TONY
10        1442    LOXL4    bad      1        0      0        1        0
10        335     LOXL4    bad      1        0      0        0        0
10        3438    LOXL4    good     0        0      1        0        0
10        4819    PYROXD2  bad      0        1      0        0        0
10        4829    PYROXD2  bad      0        1      0        1        0
10        9851    HPS1     good     1        0      0        0        0

The first 4 columns are descriptors, and the last 4 columns are people/observations. The end goal is to count how many total “good” and “bad” observations per GENE per person. Thus, I want this:

GENE     DESC     JOE      FRED   BILLY    SUSAN    TONY
LOXL4    bad      2        0      0        1        0
LOXL4    good     0        0      1        0        0
PYROXD2  bad      0        2      0        1        0
HPS1     good     1        0      0        0        0

The following code collapses all the individual observations (Joe, Fred, etc), how can I keep them separate? I would also like to be flexible enough to accommodate a more individuals in the future (keeping the same 4 descriptor columns)

mytable.groupby(['GENE','DESC']).size()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T09:26:51+00:00Added an answer on June 9, 2026 at 9:26 am

    Just use the aggregate method of the groupby object:

    In [156]: df
    Out[156]: 
       CHROM   POS     GENE  DESC  JOE  FRED  BILLY  SUSAN  TONY
    0     10  1442    LOXL4   bad    1     0      0      1     0
    1     10   335    LOXL4   bad    1     0      0      0     0
    2     10  3438    LOXL4  good    0     0      1      0     0
    3     10  4819  PYROXD2   bad    0     1      0      0     0
    4     10  4829  PYROXD2   bad    0     1      0      1     0
    5     10  9851     HPS1  good    1     0      0      0     0
    
    In [157]: grouped = df.groupby(['GENE', 'DESC'])
    
    In [158]: grouped.agg(np.sum) # agg is a shortcut for aggregate
    Out[158]: 
                  CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
    GENE    DESC                                            
    HPS1    good     10  9851    1     0      0      0     0
    LOXL4   bad      20  1777    2     0      0      1     0
            good     10  3438    0     0      1      0     0
    PYROXD2 bad      20  9648    0     2      0      1     0
    

    As mentioned by Daniel Velkow in the comment, the groupby object has some “build in” methods for simple aggregations like sum, mean, … (something like ufuncs in numpy which are available as methods for numpy arrays). So the last step could be further simplified to

    In [159]: grouped.sum()
    Out[159]: 
                  CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
    GENE    DESC                                            
    HPS1    good     10  9851    1     0      0      0     0
    LOXL4   bad      20  1777    2     0      0      1     0
            good     10  3438    0     0      1      0     0
    PYROXD2 bad      20  9648    0     2      0      1     0
    

    If you want different operations on each column, according to the docs you can pass a dict to aggregate.

    However I found no way to specify a function for a single column and use a default for others. So one way would be to define a custom aggregation function:

    def custom_agg(s, default=np.sum, other={}):
        if s.name in other.keys():
            return other[s.name](s)
        else:
            return default(s)
    

    and than apply it by passing the function and the args to agg:

    In [59]: grouped.agg(custom_agg, default=np.sum, other={'CHROM': np.mean})
    Out[59]: 
                  CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
    GENE    DESC                                            
    HPS1    good     10  9851    1     0      0      0     0
    LOXL4   bad      10  1777    2     0      0      1     0
            good     10  3438    0     0      1      0     0
    PYROXD2 bad      10  9648    0     2      0      1     0
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like my API to have a validation-only request. For example, if I
I am implementing a search where I would like to partially match an entity's
I would like to use a CATextLayer to display some partially bolded text in
I have two pieces of javascript that I would like to work together. I
I have a service implemented in MVC4 / ASP.NET Web Api. I would like
I would like to now how those instructions are allocating memory. For example what
I would like to write a ItemsControl derived custom control. This is partially from
I have a small web.py Python application that I would like to serve under
I have a database stored in memory of different files headers. I would like
I have some code in the form of a string and would like to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.