I asked this question a year ago and got code for this “probability heatmap”:

numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))
mxcum <- reshape(data.frame(xcum), varying=1+1:numbet,
idvar="trial", v.names="outcome", direction="long", timevar="bet")
library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize,
ymin=c(0, head(seq_along(V1)/length(V1), -1)),
ymax=seq_along(V1)/length(V1),
fill=(V1/sum(V1)))
head(mxcum3)
library(ggplot2)
p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) +
geom_rect(aes(fill=fill), colour="grey80") +
scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
scale_y_continuous(formatter="percent") +
xlab("Bet")
print(p)
(May need to change this code slightly because of this)
This is almost exactly what I want. Except each vertical shaft should have different numbers of bins, ie the first should have 2, second 3, third 4 (N+1). In the graph shaft 6 +7 have the same number of bins (7), where 7 should have 8 (N+1).
If I’m right, the reason the code does this is because it is the observed data and if I ran more trials we would get more bins. I don’t want to rely on the number of trials to get the correct number of bins.
How can I adapt this code to give the correct number of bins?
I have used R’s
dbinomto generate the frequency of heads forn=1:32trials and plotted the graph now. It will be what you expect. I have read some of your earlier posts here on SO and onmath.stackexchange. Still I don’t understand why you’d want tosimulatethe experiment rather than generating from a binomial R.V. If you could explain it, it would be great! I’ll try to work on the simulated solution from @Andrie to check out if I can match the output shown below. For now, here’s something you might be interested in.The plot:Edit: Explanation of how your old code from
Andrieworks and why it doesn’t give what you intend.Basically, what Andrie did (or rather one way to look at it) is to use the idea that if you have two binomial distributions,
X ~ B(n, p)andY ~ B(m, p), wheren, m = sizeandp = probability of success, then, their sum,X + Y = B(n + m, p)(1). So, the purpose ofxcumis to obtain the outcome for alln = 1:32tosses, but to explain it better, let me construct the code step by step. Along with the explanation, the code forxcumwill also be very obvious and it can be constructed in no time (without any necessity forfor-loopand constructing acumsumeverytime.If you have followed me so far, then, our idea is first to create a
numtri * numbetmatrix, with each column (length = numtri) having0'sand1'swith probability =5/6and1/6respectively. That is, if you havenumtri = 1000, then, you’ll have ~ 8340'sand 1661's*for each of thenumbetcolumns (=32 here). Let’s construct this and test this first.Now, each of these columns are samples of binomial distribution with
n = 1andsize = numtri. If we were to add the first two columns and replace the second column with this sum, then, from (1), since the probabilities are equal, we’ll end up with a binomial distribution withn = 2. Similarly, instead, if you had added the first three columns and replaced th 3rd column by this sum, you would have obtained a binomial distribution withn = 3and so on…The concept is that if you
cumulativelyadd each column, then you end up withnumbetnumber of binomial distributions (1 to 32 here). So, let’s do that.If you divide the
xcum, we have generated thus far bycumsum(1:numbet)over each row in this manner:this will be identical to the
xcummatrix that comes out of thefor-loop(if you generate it with the same seed). However I don’t quite understand the reason for this division by Andrie as this is not necessary to generate the graph you require. However, I suppose it has something to do with thefrequencyvalues you talked about in an earlier post on math.stackexchangeNow on to why you have difficulties obtaining the graph I had attached (with
n+1bins):For a binomial distribution with
n=1:32trials,5/6as probability of tails (failures) and1/6as the probability of heads (successes), the probability ofkheads is given by:For the test data we’ve generated, for
n=7andn=8(trials), the probability ofk=0:7andk=0:8heads are given by:Why are they both having 6 bins and not 8 and 9 bins? Of course this has to do with the value of
numtri=1000. Let’s see what’s the probabilities of each of these 8 and 9 bins by generating probabilities directly from the binomial distribution usingdbinomto understand why this happens.You see that the probabilities corresponding to
k=6,7andk=6,7,8corresponding ton=7andn=8are ~0. They are very low in values. The minimum value here is5.8 * 1e-7actually (n=8,k=8). This means that you have a chance of getting 1 value if you simulated for1/5.8 * 1e7times. If you check the same forn=32 and k=32, the value is1.256493 * 1e-25. So, you’ll have to simulate that many values to get at least 1 result where all32outcomes are head forn=32.This is why your results were not having values for certain bins because the probability of having it is very low for the given
numtri. And for the same reason, generating the probabilities directly from the binomial distribution overcomes this problem/limitation.I hope I’ve managed to write with enough clarity for you to follow. Let me know if you’ve trouble going through.
Edit 2:
When I simulated the code I’ve just edited above with
numtri=1e6, I get this forn=7andn=8and count the number of heads fork=0:7andk=0:8:Note that, there are k=6 and k=7 now for n=7 and n=8. Also, for n=8, you have a value of 1 for k=8. With increasing
numtriyou’ll obtain more of the other missing bins. But it’ll require a huge amount of time/memory (if at all).