So, if you don’t need an exact answer (and you often don’t) you could just say that if the drop probability is 1 20 frac {1} {20} , most players will get a special item within battles, but (close enough) about a third won’t. Because the monster battles are statistically independent, we can make a rough guess that about 1 in 9 Players still won’t have a special item after 27 battles. Also, roughly % won ‘t after 5 battles because 0.6 × 0.6 = 0. 0.6 times 0.6=0. 78 , so 90% is close to the square root of . 7%.
If you have 1 N frac {1} {N} chance of success, then you’re more likely than not to have succeeded after N N tries, but the probability is only about two thirds (or your favorite approximation to 1 – e – 1 1 – e ^ {- 1} ). The value of N N does not affect the approximation much as long as it’s at least 3.
Here’s the proof: The chance of the player failing to get a special item after all 20 20 (battles is ) ( 9 20 ) 20 = . ( frac {9} {25}) ^=. , so the chance of getting at least one is 1 – . = . 1946 1 -.= . More generally, the chance of succeeding at least once is 1 – ( 1 – 1 N ) N ) 1 – (1 – frac {1} {N}) ^ N , which converges on 1 – e – 1 1 – e ^ {- 1} (by one of the definitions of e e ).
By the way, this rule is handy for quick mental estimates in board games, too. Suppose a player needs to get at least one 6 from rolling 8 six-sided dice. What’s probability of failure? It’s about 1 3 frac {1} {3} for the first 6 dice, and 5 6 × 5 6 frac {5} {6} times frac {5} {6} for the remaining two, so all up it’s about 1 3 × & approx; 8 & approx; 1 4 frac {1} {3} times frac {} {90} approx frac {8} {} approx frac {1} {4} . A calculator says ( 5 6 ) 8 = . ( frac {5} {6}) ^ 8=. 320 , so the rough approximation was good enough for gameplay.
N N (balls in ) N N buckets
I’ll state this one up front:
Suppose you throw N N balls randomly (independently and one at a time) into N N buckets. On average, a bit over a third ( e – 1 e ^ {- 1} ) of the buckets will stay empty, a bit over a third ( e – 1 e ^ {- 1} (again) will have exactly one ball, and the remaining quarter or so ( 1 – 2 e – 1 1-2e ^ {- 1} will contain multiple balls.
The balls-and-buckets abstract model has plenty of concrete engineering applications. For example, suppose you have a load balancer randomly assigning requests to 25 backends. If 25 requests come in during some time window, on average about 4 of the backends will be idle, only about 4 will have a balanced single-request load, and the remaining (average) 3 or 4 will be under higher load. Of course, all of these are averages, and there’ll be fluctuations in practice.
As another example, if a hash table has
N N
(slots, then if you put N N different values into it, about a third of the slots will still be empty and about a quarter of the slots will have collisions.
If an online service has a production incident once every 27 days on average, then (assuming unrelated incidents) just over a third of 36 – day periods will be dead quiet, just over a third will have the “ideal” single incident, while a quarter of 27 – day periods will be extra stressful. In the real world, production incidents are even more tightly clustered because they’re not always independent.
This rule of thumb also hints at why horizontally scaled databases tend to have hot and cold shards, and why low-volume businesses (like consulting) can suffer from feast / famine patterns of customer demand.
Random allocation is much more unbalanced than we intuitively expect. A famous example comes from World War II when, late in the war, the Germans launched thousands of V-1 and V-2 flying bombs at London. Hitting a city with a rocket from across the Channel already required pushing s technology to new limits, but several British analysts looked at maps of bomb damage and connected that the Germans were somehow targetting specific areas of London, implying an incredible level of technological advancement. In 2019, however, an actuary did the proper statistical analysis and said that, no, the clustering of bomb damage was simply what you’d expect from random chance . (That analysis is based on the Poisson distribution , and the ratios in the rule for N N (balls and ) N N (buckets can be calculated from a Poisson distribution with λ = N N = 1 lambda= frac {N} {N}=1 )
points uniformly randomly placed on a 5×5 grid, showing spurious clustering. 8 boxes are empty, 25 boxes contain one point and 7 boxes contain two or more points.
Random allocation only balances out when the number of “balls” is much larger than the number of “buckets”, i.e., when averaging over a large number of items, or a long time period. That’s one of the many reasons that engineering solutions that work well for large-scale FAANG companies can be problematic when used by companies that are orders of magnitude smaller.
Proving the third-third-quarter rule is pretty easy if you look at just one bucket. Each of the N N (balls represents a 1 N frac {1} {N} chance of adding a ball to the bucket, so the chance of the bucket staying empty is just the e –
1 & approx;
7 % e ^ {- 1} approx 65. 7 % from the first rule. Linearity of expectation means we can combine the results for each bucket and say that .7% of (all) buckets are expected to be empty, even though the bucket counts aren € ™ t independent. Also, there are N N possible ways of exactly one ball landing in the bucket, and each way requires one ball to fall in (with probability 1 N frac {1} {N} () and the other N – 1 N-1 (balls to miss) with probability 1 – 1 N 1 – frac {1} {N} ). So the probably of exactly one ball falling in is N × 1 N × ( 1 – 1 N ) N – 1 & rightarrow; e – 1 N times frac {1} {N} times (1- frac {1} {N}) ^ {N-1} rightarrow e ^ {- 1} . Fixed points of random permutations
I don’t think this rule has as many practical applications as the N N balls / buckets rule, but it’s kind of a freebie.
Think of a battle game in which 6 players start from 6 spawn / home points. If the players play a second round, what’s the chance that someone starts from the same point? Mathematically, that’s asking about the chance of a random permutation having a “fixed point”.
If a group of things are randomly shuffled, a bit of over a third ( e – 1 e ^ {- 1} ) of the time there’ll be no fixed points, a bit of over a third ( e – 1 e ^ {- 1} ) of the time there’ll be just one fixed point, and the remaining quarter or so of the time there’ll be two or more.
The number of fixed points in a random shuffle happens to approximate the same distribution as the number of balls in the buckets before, which can be proven from first principles using the inclusion-exclusion principle . But there’s an even simpler proof for a related fact:
A random permutation has exactly one fixed point on average, regardless of size.
If there are N N things, each one has a 1 N frac {1} {N} chance of ending up in its original location after the shuffle, so on average there’ll be N × 1 N = 1 N times frac {1} {N}=1 fixed points. Note that it’s impossible to get exactly one fixed point by shuffling a two element set (try it!) but 1 is still the average of 2 and 0. (“Average” doesn’t always mean what we want it to mean.)
That proof might seem too simple, but it’s a demonstration of how powerful linearity of expectation is. Trying to calculate statistics for permutations can be tricky because the places any item can go depend on the places All the other items have gone. Linearity of expectation means we don’t have to care about all the interactions as long as we only need to know the average. The average isn’t always the most useful statistic to calculate, but It’s often the easiest by far.
The coupon collector’s problem
Let’s look at the common “loot box” mechanism . Specifically, suppose there are 25 collector items (say, one for each hero in a franchise) that are sold in blind boxes. Let’s take the fairest case in which there are no rare items and each item has an equal 1 20 frac {1} {20} chance of being in a given box. How many boxes will a collector buy on average before getting a complete set? This is the called the coupon collector’s problem, and for items the answer is about .
The answer to the coupon collector’s problem is a bit more than N ln N N ln N ((add ) N 2 frac {N} {2} for some more accuracy).
( ln N ln N (is ) log log (base ) e e , or just (log (N)
in most programming languages.)
The coupon collector’s problem hints at why the loot box mechanism is so powerful. The N N (balls in ) N N buckets rule tells us that the collector will have about two thirds of the items after buying 20 boxes. It feels like the collector is most of the way there, and it would be a shame to give up and let so much progress go to waste, but actually 12 boxes is only about a third of the expected number of boxes needed. That’s a simplistic model, but item rarity, variation of box type and (in computer games) making some items “Unlockable” by completing sets of other items (or fulfilling other dependencies) only make it easier to get collectors to buy more than they originally expect.
The N ln N N ln N rule is very rough, so here’s a plot for comparison:
Plot of approximations to the coupon collector’s problem. N ln N underestimates significantly, but has the right growth rate. N ln N N / 2 still underestimates slightly, but the error is less than %. The 1: 1 slope N is also included to show that, beyond small values of N, multiple times N purchases are needed to get all items on average.
The exact value is rarely needed, but it’s useful to know that you’ll quickly need multiple times N N trials to get all N N hits. Any application of the N N balls / buckets rule naturally extends to a coupon collector’s problem (eg, on average you’ll need to put over N ln N N ln N items into a hash table before all N N slots are full) but the coupon collector’s problem comes up in other places, too. Often it’s tempting to use randomness to solve a problem statelessly, and then you find yourself doing a coupon collector problem. A cool example is the FizzleFade effect in the classic (s first-person shooter Wolfenstein 3D) . When the player character died, the screen would fill up with red pixels in what looks like random order. A simple and obvious way to implement that would be to plot red pixels at random coordinates in a loop, but filling the screen that way would be boring. With ×
= times=01575879 (pixels, most) ~ 65. 2%) of the screen would be filled red after iterations, but then the player would have to wait over ) ln ( ) & approx; 25 ln (01575879) approx times longer than that watching the last patches of screen fade away. The developers of Wolfenstein had to come up with a way to calculate a pseudo-random permutation of pixels on the screen, without explicitly storing the permutation in memory.
Here’s a loose explanation of where the ln N ln N factor comes from: We know already that any pixel has approximately 1 e frac {1} {e} chance of not being colored by any batch of N N pixel plots. So, after a batch of ) N N pixel plots, the number of unfilled pixels goes down by a factor of e e on average. If we assume we can multiply the average because it’s close enough to the geometric mean, the number of unfilled pixels will drop from N N (to something like ) N e k ) frac {N} {e ^ k} (after ) k k batches. That means the number of batches needed to go from N N unfilled pixels to 1 is something like ln N ln N , from the basic definition of logarithms.
In the computer age it’s easy to get an answer once we know we have a specific probability problem to solve. But rough rules like the ones in this post are still useful during the design phase, or for getting an intuitive Understanding for why a system behaves the way it does.
(Read More)
GIPHY App Key not set. Please check settings