The probability distribution of rolling $n$ dice and keeping $k$ highest
In many roleplaying games one rolls a handful of dice and calculates their sum. In some games there are bonus or penalty dice, so that we roll, for example, 4 dice with six sides and take the sum of the three highest, ignoring the lowest.
So let us fix some notation. We are rolling $n \ge 0$ dice with $s \ge 1$ sides. The dice are iid distributions selecting uniformly random number from ${1, 2, \ldots, s}$. We want to keep $k \le n$ highest of the results and calculate their sum. We want to know the probability distribution, or at least as much as we can of the distribution; what is the average, for example?
An analytical formula would be the best, of course, but probably out of reach. If $k = n$, that is, we are not discarding any dice, the way I would calculate the probability distribution is to represent the single die as a probability generating function and then use multiplication of polynomials for the addition of probability distributions. I don't thing anything similar is possible here, but maybe I am wrong.
Of course, just going through every possible permutation of die results is technically possible, but it provides little general insight.
1 answer
I'm not a "mathematician" by trade, I'm a software engineer, so there may be a better way out there, but I figured I'd write up what I do know about just in case my perspective was useful.
The best ways I've found for understanding these kinds of probability problems is with a dedicated calculator designed for them.
So, for example, you can use Anydice with a "program" saying
output [highest 3 of 4d6]
for your initial example, and it provides a graph showing the distribution. You can change the "3", "4", and "6" for any values of k, n, and s you want.
My understanding (which may be wrong, but think is how it works) is that Anydice isn't a numeric Monte Carlo approximation or the like, but that it actually works through all the possibilities for each die being rolled to find the real probability curve. I'm not aware of the source code being publicly available, though, but even if it were available I'm not sure one could derive a general formula from it for all possible k/n/s.
In general randomness has been tricky to model and easily see the patterns of, even with modern programming languages. As "big data" analysis grows as a field, even more tools designed to help model probabilistic events are being developed like "Bean Machine". You might be able to use tools like those to look for the "general insight" you're searching for.
1 comment thread