I would like to partially collapse a DataFrame /matrix and keep the structure intact

Question

0

Asked: June 9, 20262026-06-09T09:26:48+00:00 2026-06-09T09:26:48+00:00

I would like to partially collapse a DataFrame /matrix and keep the structure intact

0

I would like to partially “collapse” a DataFrame/matrix and keep the structure intact by just summing the condensed values. For example, I have this:

CHROM     POS     GENE     DESC     JOE      FRED   BILLY    SUSAN    TONY
10        1442    LOXL4    bad      1        0      0        1        0
10        335     LOXL4    bad      1        0      0        0        0
10        3438    LOXL4    good     0        0      1        0        0
10        4819    PYROXD2  bad      0        1      0        0        0
10        4829    PYROXD2  bad      0        1      0        1        0
10        9851    HPS1     good     1        0      0        0        0

The first 4 columns are descriptors, and the last 4 columns are people/observations. The end goal is to count how many total “good” and “bad” observations per GENE per person. Thus, I want this:

GENE     DESC     JOE      FRED   BILLY    SUSAN    TONY
LOXL4    bad      2        0      0        1        0
LOXL4    good     0        0      1        0        0
PYROXD2  bad      0        2      0        1        0
HPS1     good     1        0      0        0        0

The following code collapses all the individual observations (Joe, Fred, etc), how can I keep them separate? I would also like to be flexible enough to accommodate a more individuals in the future (keeping the same 4 descriptor columns)

mytable.groupby(['GENE','DESC']).size()

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T09:26:51+00:00

Just use the aggregate method of the groupby object:

In [156]: df
Out[156]: 
   CHROM   POS     GENE  DESC  JOE  FRED  BILLY  SUSAN  TONY
0     10  1442    LOXL4   bad    1     0      0      1     0
1     10   335    LOXL4   bad    1     0      0      0     0
2     10  3438    LOXL4  good    0     0      1      0     0
3     10  4819  PYROXD2   bad    0     1      0      0     0
4     10  4829  PYROXD2   bad    0     1      0      1     0
5     10  9851     HPS1  good    1     0      0      0     0

In [157]: grouped = df.groupby(['GENE', 'DESC'])

In [158]: grouped.agg(np.sum) # agg is a shortcut for aggregate
Out[158]: 
              CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
GENE    DESC                                            
HPS1    good     10  9851    1     0      0      0     0
LOXL4   bad      20  1777    2     0      0      1     0
        good     10  3438    0     0      1      0     0
PYROXD2 bad      20  9648    0     2      0      1     0

As mentioned by Daniel Velkow in the comment, the groupby object has some “build in” methods for simple aggregations like sum, mean, … (something like ufuncs in numpy which are available as methods for numpy arrays). So the last step could be further simplified to

In [159]: grouped.sum()
Out[159]: 
              CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
GENE    DESC                                            
HPS1    good     10  9851    1     0      0      0     0
LOXL4   bad      20  1777    2     0      0      1     0
        good     10  3438    0     0      1      0     0
PYROXD2 bad      20  9648    0     2      0      1     0

If you want different operations on each column, according to the docs you can pass a dict to aggregate.

However I found no way to specify a function for a single column and use a default for others. So one way would be to define a custom aggregation function:

def custom_agg(s, default=np.sum, other={}):
    if s.name in other.keys():
        return other[s.name](s)
    else:
        return default(s)

and than apply it by passing the function and the args to agg:

In [59]: grouped.agg(custom_agg, default=np.sum, other={'CHROM': np.mean})
Out[59]: 
              CHROM   POS  JOE  FRED  BILLY  SUSAN  TONY
GENE    DESC                                            
HPS1    good     10  9851    1     0      0      0     0
LOXL4   bad      10  1777    2     0      0      1     0
        good     10  3438    0     0      1      0     0
PYROXD2 bad      10  9648    0     2      0      1     0

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I would like to partially collapse a DataFrame /matrix and keep the structure intact

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply