# The probability distribution of rolling $n$ dice and keeping $k$ highest

In many roleplaying games one rolls a handful of dice and calculates their sum. In some games there are bonus or penalty dice, so that we roll, for example, 4 dice with six sides and take the sum of the three highest, ignoring the lowest.

So let us fix some notation. We are rolling $n \ge 0$ dice with $s \ge 1$ sides. The dice are iid distributions selecting uniformly random number from ${1, 2, \ldots, s}$. We want to keep $k \le n$ highest of the results and calculate their sum. **We want to know the probability distribution**, or at least as much as we can of the distribution; what is the average, for example?

An analytical formula would be the best, of course, but probably out of reach. If $k = n$, that is, we are not discarding any dice, the way I would calculate the probability distribution is to represent the single die as a probability generating function and then use multiplication of polynomials for the addition of probability distributions. I don't thing anything similar is possible here, but maybe I am wrong.

Of course, just going through every possible permutation of die results is technically possible, but it provides little general insight.

## 1 answer

I'm not a "mathematician" by trade, I'm a software engineer, so there may be a better way out there, but I figured I'd write up what I do know about just in case my perspective was useful.

The best ways I've found for understanding these kinds of probability problems is with a dedicated calculator designed for them.

So, for example, you can use Anydice with a "program" saying

```
output [highest 3 of 4d6]
```

for your initial example, and it provides a graph showing the distribution. You can change the "3", "4", and "6" for any values of *k*, *n*, and *s* you want.

My understanding (which may be wrong, but *think* is how it works) is that Anydice isn't a numeric Monte Carlo approximation or the like, but that it actually works through all the possibilities for each die being rolled to find the real probability curve. I'm not aware of the source code being publicly available, though, but even if it were available I'm not sure one could derive a general formula from it for all possible *k*/*n*/*s*.

In general randomness has been tricky to model and easily see the patterns of, even with modern programming languages. As "big data" analysis grows as a field, even more tools designed to help model probabilistic events are being developed like "Bean Machine". You might be able to use tools like those to look for the "general insight" you're searching for.

#### 1 comment

Thanks; my interest is in a formula in specific, in this case, rather than something that lists all the possibilities.

## 5 comments

You might be interested in Troll, a language and interpreter for expressing complex dice probabilities. The case you're interested in would be

`sum largest k nDs`

in the Troll language (with suitable constants for $k$, $n$, and $s$). Perhaps one of the papers, or the source code, holds a general answer for you. — r~~ 7 months agoThe case $k=1$ is also easy: we take $X = \max_{i=1}^n(X_i)$ and observe that for $x \in [1, s]$, $P(X \le x) = \left(\frac{x}{s}\right)^n$ because each independent die must roll no more than $x$. From that we can get $P(X = x)$ in closed form and $E(X)$ in terms of Faulhaber's formulas. — Peter Taylor 7 months ago

@Peter Taylor Thanks, I am aware. @r~~ Thanks, I did not find a general solution in the papers. — tommi 7 months ago

The keyword to search for is "order statistics". Searching for "order statistics dice" or similar gets us a thread at CrossValidated, Relevancy of order statistics to the roll-and-keep dice mechanic? I think the most important insight is at the end of the answer there: "Order statistics for discrete distributions are messy, so I don't expect to find a big simplification by using them. " — Stephan Kolassa 6 months ago

@Stephan Kolassa Could you write an answer based on that? — tommi 6 months ago