Have a situation where I am given a total ticket count, and cumulative ticket sale data as follows:
Total Tickets Available: 300
Day 1: 15 tickets sold to date
Day 2: 20 tickets sold to date
Day 3: 25 tickets sold to date
Day 4: 30 tickets sold to date
Day 5: 46 tickets sold to date
The number of tickets sold is nonlinear, and I’m asked if someone plans to buy a ticket on Day 23, what is the probability he will get a ticket?
I’ve been looking at quite a libraries used for curve fitting like numpy, PyLab, and sage but I’ve been a bit overwhelmed since statistics is not in my background. How would I easily calculate a probability given this set of data? If it helps, I also have ticket sale data at other locations, the curve should be somewhat different.
The best answer to this question would require more information about the problem–are people more/less likely to buy a ticket as the date approaches (and mow much)? Are there advertising events that will transiently affect the rate of sales? And so on.
We don’t have access to that information, though, so let’s just assume, as a first approximation, that the rate of ticket sales is constant. Since sales occur basically at random, they might be best modeled as a Poisson process Note that this does not account for the fact that many people will buy more than one ticket, but I don’t think that will make much difference for the results; perhaps a real statistician could chime in here. Also: I’m going to discuss the constant-rate Poisson process here but note that since you mentioned the rate is decidedly NOT constant, you could look into variable-rate Poisson processes as a next step.
To model a Poisson process, all you need is the average rate of ticket sales. In your example data, sales-per-day are [15, 5, 5, 5, 16], so the average rate is about 9.2 tickets per day. We’ve already sold 46 tickets, so there are 254 remaining.
From here, it is simple to ask, “Given a rate of 9.2 tpd, what is the probability of selling less than 254 tickets in 23 days?” (ignore the fact that you can’t sell more than 300 tickets). The way to calculate this is with a cumulative distribution function (see here for the CDF for a poisson distribution).
On average, we would expect to sell 23 * 9.2 = 211.6 tickets after 23 days, so in the language of probability distributions, the expectation value is 211.6. The CDF tells us, “given an expectation value λ, what is the probability of seeing a value <= x”. You can do the math yourself or ask scipy to do it for you:
So this tells us: IF ticket sales can be accurately represented as a Poisson process and IF the average rate of ticket sales really is 9.2 tpd, then the probability of at least one ticket being available after 23 more days is 99.7%.
Now let’s say someone wants to bring a group of 50 friends and wants to know the probability of getting all 50 tickets if they buy them in 25 days (rephrase the question as “If we expect on average to sell 9.2 * 25 tickets, what is the probability of selling <= (254-50) tickets?”):
So the probability of having 50 tickets available after 25 days is about 4%.