I have a dataset like so: testdata <- read.table(header=T, text=’ patids labels dbins vprobs

Question

0

Asked: June 17, 20262026-06-17T12:32:41+00:00 2026-06-17T12:32:41+00:00

I have a dataset like so: testdata <- read.table(header=T, text=’ patids labels dbins vprobs

0

I have a dataset like so:

testdata <- read.table(header=T, text='
patids labels dbins vprobs Response 
16186 SUP0         0.0         100        1 
16186 SUP0         0.2         99        1 
16186 SUP0         0.4         95        1 
16186 SUP0         0.6         99        1 
16186 SUP0         0.8         50        1 
16186 SUP0         1.0         0        1 
18185 SUP0         0.0         100        0 
18185 SUP0         0.2         100        0 
18185 SUP0         0.4         5        0 
18185 SUP0         0.6         2        0 
18185 SUP0         0.8         0        0 
54234 INF0         0.0         100        1 
54234 INF0         0.2         95        1 
54234 INF0         0.4         90        1 
54234 INF0         0.6         30        1 
54234 INF0         0.8         0        1 
18185 INF0         0.0         100        0 
18185 INF0         0.2         20        0 
18185 INF0         0.4         10        0 
18185 INF0         0.6         5        0 
18185 INF0         0.8         3        0 
18185 INF0         1.0         0        0 
16186 INF0         0.0         100        1 
16186 INF0         0.2         100        1 
16186 INF0         0.4         70        1 
16186 INF0         0.6         60        1 
16186 INF0         0.8         50        1 
16186 INF0         1.0         0        1 
54234 SUP1         0.0         100        1 
54234 SUP1         0.2         95        1 
54234 SUP1         0.4         90        1 
54234 SUP1         0.6         30        1 
54234 SUP1         0.8         0        1 
18185 SUP1         0.0         100        0 
18185 SUP1         0.2         50        0 
18185 SUP1         0.4         0        0
16186 SUP1         0.0         100        1 
16186 SUP1         0.2         100        1 
16186 SUP1         0.4         40        1 
16186 SUP1         0.6         10        1 
16186 SUP1         0.8         22        1 
16186 SUP1         1.0         0        1 ')

Now, for each “labels”, i.e SUP0, SUP1 e.t.c, I want to obtain the mean of the variable dbins ( mean taken over all unique “patids” variables. The problem I am facing is that the “dbins” are not all of same length for each “patids”. Is there some way to fill with NAs or 0’s before taking this means ? My expected output has to be like this:

for SUP0

labels dbins dbins.16186 dbins.18185
SUP0         0.0         0.0 
SUP0         0.2         0.2          
SUP0         0.4         0.4          
SUP0         0.6         0.6          
SUP0         0.8         0.8          
SUP0         1.0         NA

and for INF0

labels      dbins.54234 dbins.18185 dbins.16186
INF0         0.0         0.0        0.0    0.0    
INF0         0.2         0.2        0.0    0.2   
INF0         0.4         0.4        0.0    0.4   
INF0         0.6         0.6        0.0    0.6   
INF0         0.8         0.8        0.8    0.8   
INFO         NA          1.0        1.0    1.0

…so that I can take mean over columns.

I have been trying with ddply and simillar functions but I can’t get this particular output format. Can someone please help ?

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T12:32:41+00:00

The answer you want could be one of two things.

The exact output you’ve suggested.
The means of each of the categories (for which the output you’ve provided is just a method of getting there)

I’m going to use plyr and reshape2, but no doubt @mnel will be around soon to give a data.table solution.

1. The output you’ve suggested

The problem here is that you have several groups with multiple elements. So first, we need to group the elements (using @Maiasaura’s solution here).

library(plyr)
testgroups <- ddply(testdata, .(labels, patids), function(x) { x$group <- 1:nrow(x); x })

Then we can reshape them properly:

library(reshape2)
testreshape <- dcast(testgroups[,c("labels", "patids", "dbins", "group")], labels+group~patids, value.var="dbins")

   labels group 16186 18185 54234
1    INF0     1   0.0   0.0   0.0
2    INF0     2   0.2   0.2   0.2
3    INF0     3   0.4   0.4   0.4
4    INF0     4   0.6   0.6   0.6
5    INF0     5   0.8   0.8   0.8
6    INF0     6   1.0   1.0    NA
7    SUP0     1   0.0   0.0    NA
8    SUP0     2   0.2   0.2    NA
9    SUP0     3   0.4   0.4    NA
10   SUP0     4   0.6   0.6    NA
11   SUP0     5   0.8   0.8    NA
12   SUP0     6   1.0    NA    NA
13   SUP1     1   0.0   0.0   0.0
14   SUP1     2   0.2   0.2   0.2
15   SUP1     3   0.4   0.4   0.4
16   SUP1     4   0.6    NA   0.6
17   SUP1     5   0.8    NA   0.8
18   SUP1     6   1.0    NA    NA

From here you can use something like testreshape[testreshape$labels=="INF0",]

2. The means of each of the categories

This is much simpler:

library(plyr)
testmeans <- ddply(testdata, .(labels, patids), summarise, mean=mean(dbins, na.rm=TRUE))

  labels patids mean
1   INF0  16186  0.5
2   INF0  18185  0.5
3   INF0  54234  0.4
4   SUP0  16186  0.5
5   SUP0  18185  0.4
6   SUP1  16186  0.5
7   SUP1  18185  0.2
8   SUP1  54234  0.4

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataset like so: testdata <- read.table(header=T, text=’ patids labels dbins vprobs

Leave an answerCancel reply

1 Answer

1. The output you’ve suggested

2. The means of each of the categories

Leave an answer
Cancel reply