I have a dataset like so:
testdata <- read.table(header=T, text='
patids labels dbins vprobs Response
16186 SUP0 0.0 100 1
16186 SUP0 0.2 99 1
16186 SUP0 0.4 95 1
16186 SUP0 0.6 99 1
16186 SUP0 0.8 50 1
16186 SUP0 1.0 0 1
18185 SUP0 0.0 100 0
18185 SUP0 0.2 100 0
18185 SUP0 0.4 5 0
18185 SUP0 0.6 2 0
18185 SUP0 0.8 0 0
54234 INF0 0.0 100 1
54234 INF0 0.2 95 1
54234 INF0 0.4 90 1
54234 INF0 0.6 30 1
54234 INF0 0.8 0 1
18185 INF0 0.0 100 0
18185 INF0 0.2 20 0
18185 INF0 0.4 10 0
18185 INF0 0.6 5 0
18185 INF0 0.8 3 0
18185 INF0 1.0 0 0
16186 INF0 0.0 100 1
16186 INF0 0.2 100 1
16186 INF0 0.4 70 1
16186 INF0 0.6 60 1
16186 INF0 0.8 50 1
16186 INF0 1.0 0 1
54234 SUP1 0.0 100 1
54234 SUP1 0.2 95 1
54234 SUP1 0.4 90 1
54234 SUP1 0.6 30 1
54234 SUP1 0.8 0 1
18185 SUP1 0.0 100 0
18185 SUP1 0.2 50 0
18185 SUP1 0.4 0 0
16186 SUP1 0.0 100 1
16186 SUP1 0.2 100 1
16186 SUP1 0.4 40 1
16186 SUP1 0.6 10 1
16186 SUP1 0.8 22 1
16186 SUP1 1.0 0 1 ')
Now, for each “labels”, i.e SUP0, SUP1 e.t.c, I want to obtain the mean of the variable dbins ( mean taken over all unique “patids” variables. The problem I am facing is that the “dbins” are not all of same length for each “patids”. Is there some way to fill with NAs or 0’s before taking this means ? My expected output has to be like this:
for SUP0
labels dbins dbins.16186 dbins.18185
SUP0 0.0 0.0
SUP0 0.2 0.2
SUP0 0.4 0.4
SUP0 0.6 0.6
SUP0 0.8 0.8
SUP0 1.0 NA
and for INF0
labels dbins.54234 dbins.18185 dbins.16186
INF0 0.0 0.0 0.0 0.0
INF0 0.2 0.2 0.0 0.2
INF0 0.4 0.4 0.0 0.4
INF0 0.6 0.6 0.0 0.6
INF0 0.8 0.8 0.8 0.8
INFO NA 1.0 1.0 1.0
…so that I can take mean over columns.
I have been trying with ddply and simillar functions but I can’t get this particular output format. Can someone please help ?
Thanks in advance
The answer you want could be one of two things.
The exact output you’ve suggested.
The means of each of the categories (for which the output you’ve provided is just a method of getting there)
I’m going to use plyr and reshape2, but no doubt @mnel will be around soon to give a data.table solution.
1. The output you’ve suggested
The problem here is that you have several groups with multiple elements. So first, we need to group the elements (using @Maiasaura’s solution here).
Then we can reshape them properly:
From here you can use something like
testreshape[testreshape$labels=="INF0",]2. The means of each of the categories
This is much simpler: