My son recently picked up a copy of Rubiks Race, a kids’ board game
that’s essentially a slide puzzle turned into a two-player showdown.
You first generate a random target pattern by rolling nine
six-coloured dice in a little container, and then each player slides
their tiles around to match the pattern. Sometimes the target pattern
is invalid – when it has 5 or more of the same colour – but how
often exactly?
The fact that you have to re-roll the dice occasionally is hardly a
big inconvenience, but it’s a fun exercise to think about the chance
of it happening. We tried to guess the rate and we came up with the
broad range of somewhere between 1% and 20%.
Random sampling
One simple way to estimate the rate is to (virtually) roll the dice a
lot of times and count how often it happens.
Running this with 10,000 rolls gives these results:
2,1586
3,5518
4,2391
5,453
6,45
7,7
So we can see that 505 times out of 10,000 we would have to re-roll,
about 5% of the time. More than half of the time, the most common
colour will be showing on exactly 3 dice.
Exhaustive counting
Can we get a more precise answer? We could of course sample more, but
in this case there are not that many combinations in total – only
6^9, or about 10 million – so we can simply try them all.
So exactly 541176/10077696 times we’ll have to re-roll (about 5.73%).
The only problem remaining with this program is that it’s slow: on my
laptop it took about 22 seconds to run. How can we make it faster?
Faster
One way to make it faster is to observe that the colours are
interchangeable so we’re counting separately patterns that look
superficially different but are actually the same. (For example, the
pattern with 6 reds, 2 greens and 1 blue is spiritually the same as
the one with 6 greens, 2 yellows and 1 orange.) We can avoid some of
the repetition by forcing the first die to come up, say, Red.
Then we multiply the resulting counts by 6 to get the full total. As
expected, the running time is now just 1/6 of before, about 3.5
seconds.
Why is this correct? Think of it this way: we’ve counted all the
different patterns that start with a red. But there’s nothing special
about red; if we repeated the exercise using blue instead we’d get
exactly the same counts. So the counts are correct for every starting
colour, we just have to add them up to get the true counts.
A direct approach
So far we haven’t used any real mathematical knowledge other than
counting. (A bit of programming knowledge and spare CPU time can make
up for a lot of mathematical ignorance.) If we just want to know how
often we’d have to re-roll – without getting the distribution of
“number of the most commonly occurring colour” – we can apply some
high-school probability theory.
That is, we compute the probability of getting 5, 6, 7, 8 or 9 “red”
faces from a roll of 9 dice, then add those up. We multiply the final
result by 6 for the same reason as before.
The answer is 22549 % 419904, which is the same as before in reduced
form.
Dynamic Programming
But what if you really, truly, want the distribution of the number of
the most commonly occurring colour, and you’re not willing to wait a
few seconds to get it? Another approach is dynamic programming:
This gives the same result as the exhaustive counting from earlier,
but in the more acceptable running time of a couple of milliseconds.