Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9158903
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T13:21:43+00:00 2026-06-17T13:21:43+00:00

I asked this question a year ago and got code for this probability heatmap:

  • 0

I asked this question a year ago and got code for this “probability heatmap”:
heatmap

numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix 
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet, 
idvar="trial", v.names="outcome", direction="long", timevar="bet")


library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize, 
            ymin=c(0, head(seq_along(V1)/length(V1), -1)), 
            ymax=seq_along(V1)/length(V1),
            fill=(V1/sum(V1)))
head(mxcum3)

library(ggplot2)

p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) + 
geom_rect(aes(fill=fill), colour="grey80") + 
scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
scale_y_continuous(formatter="percent") +
xlab("Bet")

print(p)

(May need to change this code slightly because of this)

This is almost exactly what I want. Except each vertical shaft should have different numbers of bins, ie the first should have 2, second 3, third 4 (N+1). In the graph shaft 6 +7 have the same number of bins (7), where 7 should have 8 (N+1).

If I’m right, the reason the code does this is because it is the observed data and if I ran more trials we would get more bins. I don’t want to rely on the number of trials to get the correct number of bins.

How can I adapt this code to give the correct number of bins?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T13:21:44+00:00Added an answer on June 17, 2026 at 1:21 pm

    I have used R’s dbinom to generate the frequency of heads for n=1:32 trials and plotted the graph now. It will be what you expect. I have read some of your earlier posts here on SO and on math.stackexchange. Still I don’t understand why you’d want to simulate the experiment rather than generating from a binomial R.V. If you could explain it, it would be great! I’ll try to work on the simulated solution from @Andrie to check out if I can match the output shown below. For now, here’s something you might be interested in.

    set.seed(42)
    numbet <- 32
    numtri <- 1e5
    prob=5/6
    
    require(plyr)
    out <- ldply(1:numbet, function(idx) {
        outcome <- dbinom(idx:0, size=idx, prob=prob)
        bet     <- rep(idx, length(outcome))
        N       <- round(outcome * numtri)
        ymin    <- c(0, head(seq_along(N)/length(N), -1))
        ymax    <- seq_along(N)/length(N)
        data.frame(bet, fill=outcome, ymin, ymax)
    })
    
    require(ggplot2)
    p <- ggplot(out, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) + 
    geom_rect(aes(fill=fill), colour="grey80") + 
    scale_fill_gradient("Outcome", low="red", high="blue") +
    xlab("Bet")
    

    The plot:

    ggplot2

    Edit: Explanation of how your old code from Andrie works and why it doesn’t give what you intend.

    Basically, what Andrie did (or rather one way to look at it) is to use the idea that if you have two binomial distributions, X ~ B(n, p) and Y ~ B(m, p), where n, m = size and p = probability of success, then, their sum, X + Y = B(n + m, p) (1). So, the purpose of xcum is to obtain the outcome for all n = 1:32 tosses, but to explain it better, let me construct the code step by step. Along with the explanation, the code for xcum will also be very obvious and it can be constructed in no time (without any necessity for for-loop and constructing a cumsum everytime.

    If you have followed me so far, then, our idea is first to create a numtri * numbet matrix, with each column (length = numtri) having 0's and 1's with probability = 5/6 and 1/6 respectively. That is, if you have numtri = 1000, then, you’ll have ~ 834 0's and 166 1's *for each of the numbet columns (=32 here). Let’s construct this and test this first.

    numtri <- 1e3
    numbet <- 32
    set.seed(45)
    xcum <- t(replicate(numtri, sample(0:1, numbet, prob=c(5/6,1/6), replace = TRUE)))
    
    # check for count of 1's
    > apply(xcum, 2, sum)
    [1] 169 158 166 166 160 182 164 181 168 140 154 142 169 168 159 187 176 155 151 151 166 
    163 164 176 162 160 177 157 163 166 146 170
    
    # So, the count of 1's are "approximately" what we expect (around 166).
    

    Now, each of these columns are samples of binomial distribution with n = 1 and size = numtri. If we were to add the first two columns and replace the second column with this sum, then, from (1), since the probabilities are equal, we’ll end up with a binomial distribution with n = 2. Similarly, instead, if you had added the first three columns and replaced th 3rd column by this sum, you would have obtained a binomial distribution with n = 3 and so on…
    The concept is that if you cumulatively add each column, then you end up with numbet number of binomial distributions (1 to 32 here). So, let’s do that.

    xcum <- t(apply(xcum, 1, cumsum))
    
    # you can verify that the second column has similar probabilities by this:
    # calculate the frequency of all values in 2nd column.
    > table(xcum[,2])
      0   1   2 
    694 285  21 
    
    > round(numtri * dbinom(2:0, 2, prob=5/6))
    [1] 694 278  28
    # more or less identical, good!
    

    If you divide the xcum, we have generated thus far by cumsum(1:numbet) over each row in this manner:

    xcum <- xcum/matrix(rep(cumsum(1:numbet), each=numtri), ncol = numbet)
    

    this will be identical to the xcum matrix that comes out of the for-loop (if you generate it with the same seed). However I don’t quite understand the reason for this division by Andrie as this is not necessary to generate the graph you require. However, I suppose it has something to do with the frequency values you talked about in an earlier post on math.stackexchange

    Now on to why you have difficulties obtaining the graph I had attached (with n+1 bins):

    For a binomial distribution with n=1:32 trials, 5/6 as probability of tails (failures) and 1/6 as the probability of heads (successes), the probability of k heads is given by:

    nCk * (5/6)^(k-1) * (1/6)^k # where nCk is n choose k
    

    For the test data we’ve generated, for n=7 and n=8 (trials), the probability of k=0:7 and k=0:8 heads are given by:

    # n=7
       0    1    2     3     4     5 
    .278 .394 .233  .077  .016  .002 
    
    # n=8
       0    1    2    3     4      5 
    .229 .375 .254 .111  .025   .006 
    

    Why are they both having 6 bins and not 8 and 9 bins? Of course this has to do with the value of numtri=1000. Let’s see what’s the probabilities of each of these 8 and 9 bins by generating probabilities directly from the binomial distribution using dbinom to understand why this happens.

    # n = 7
    dbinom(7:0, 7, prob=5/6)
    # output rounded to 3 decimal places
    [1] 0.279 0.391 0.234 0.078 0.016 0.002 0.000 0.000
    
    # n = 8
    dbinom(8:0, 8, prob=5/6)
    # output rounded to 3 decimal places
    [1] 0.233 0.372 0.260 0.104 0.026 0.004 0.000 0.000 0.000
    

    You see that the probabilities corresponding to k=6,7 and k=6,7,8 corresponding to n=7 and n=8 are ~ 0. They are very low in values. The minimum value here is 5.8 * 1e-7 actually (n=8, k=8). This means that you have a chance of getting 1 value if you simulated for 1/5.8 * 1e7 times. If you check the same for n=32 and k=32, the value is 1.256493 * 1e-25. So, you’ll have to simulate that many values to get at least 1 result where all 32 outcomes are head for n=32.

    This is why your results were not having values for certain bins because the probability of having it is very low for the given numtri. And for the same reason, generating the probabilities directly from the binomial distribution overcomes this problem/limitation.

    I hope I’ve managed to write with enough clarity for you to follow. Let me know if you’ve trouble going through.

    Edit 2:
    When I simulated the code I’ve just edited above with numtri=1e6, I get this for n=7 and n=8 and count the number of heads for k=0:7 and k=0:8:

    # n = 7
         0      1      2      3      4      5      6      7 
    279347 391386 233771  77698  15763   1915    117      3 
    
    # n = 8
         0      1      2      3      4      5      6      7      8 
    232835 372466 259856 104116  26041   4271    392     22      1 
    

    Note that, there are k=6 and k=7 now for n=7 and n=8. Also, for n=8, you have a value of 1 for k=8. With increasing numtri you’ll obtain more of the other missing bins. But it’ll require a huge amount of time/memory (if at all).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I asked this question about a year ago on another site but never got
It's almost a year ago that I asked this question on SO: Menu Accelerator
This question was asked and answered more than a year ago. Since then, ColdFusion
From searching SO, this question was already asked, almost a year ago now. So
My question relates to this one asked a year ago: Is there a way
Over a year ago someone asked this question: Execute .sql files that are used
I already asked this exact question a year ago. My application uses Python's multiprocessing
I've already seen this question but it's been asked over a year ago. Perhaps
Someone asked this question about a year ago but I was unable to turn
This question was inspired by one I asked almost a year ago - any-orms-that-work-with-ms-access-for-prototyping

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.