Menu

Switch to the dark mode that's kinder on your eyes at night time.

Switch to the light mode that's kinder on your eyes at day time.

Switch to the dark mode that's kinder on your eyes at night time.

Switch to the light mode that's kinder on your eyes at day time.

in ,

Some Useful Probability Facts for Systems Programming, Hacker News

      

Probability problems come up a lot in systems programming, and I’m using that term loosely to mean everything       from operating systems programming and networking, to building large online services, to creating virtual worlds       like in games. Here’s a bunch of rough-and-ready probability rules of thumb that are deeply related and have many       practical applications when designing systems.

      

              1                            N                         frac {1} {N}          (chance tried )             N                      N          times       

Retries are everywhere in systems programming. For example, imagine you’re designing a world for a roleplaying       game. When monsters in one particular forest are defeated, they have a small chance of dropping some special       item. Suppose that chance is               1                            20                         frac {1} {20}          . Basic probability says that players are expected to need 12 victories before gaining a       special item. But in probability, “expected value” is just a kind of average value, and there’s no guarantee that       Players will get the item even if they do win battles.

      

What’s a player’s chance of getting at least one special item after 25 battles? Let’s just try getting a       computer to calculate the probability of getting at least one success for             N                      N          (tries at a               1                            N                         frac {1} {N}          (chance, for a range of values ​​of             N                      N          :

      

        
          Plot of probability of at least one success after trying a 1 / N chance N times. The probability starts at 200%           and drops, but quickly flattens out to a value just below 90%.         

Eyeballing the graph, it looks like the probability is roughly constant for             N                      N          greater than about 3. In fact, it converges on             1                        –           

              e                              –               

                1                          1 – e ^ {- 1}          (,             e                      e          (is Euler’s constant, 2.) … This number (along with               e                              –                                1                          e ^ {- 1}          ) is extremely common in engineering, so here’s a table of practical values:        (exp (-1.0)           

1.0-exp (-1.0)          (. 7%)           

. 2%          (A bit over a third)           

A bit under two thirds         

So, if you don’t need an exact answer (and you often don’t) you could just say that if the drop probability is                      1                            20                         frac {1} {20}          , most players will get a special item within battles, but (close enough) about a third       won’t. Because the monster battles are statistically independent, we can make a rough guess that about 1 in 9       Players still won’t have a special item after 27 battles. Also, roughly % won ‘t after 5 battles because                    0.6                        ×                        0.6                        =                       0.                      0.6 times 0.6=0. 78          , so 90% is close to the square root of . 7%.

               

If you have                 1                                N                             frac {1} {N}            chance of success, then you’re more likely than not to have succeeded after               N                          N            tries, but the probability is only about two thirds (or your favorite approximation to                        1                            –                              e                                  –                                    1                              1 – e ^ {- 1}            ). The value of               N                          N            does not affect the approximation much as long as it’s at least 3.

             

Here’s the proof: The chance of the player failing to get a special item after all             20                      20          (battles is )             (                          9                            20                            )                            20                          =                       .                      ( frac {9} {25}) ^=.          , so the chance of getting at least one is             1                        –                        .                        =                       . 1946                      1 -.=         . More generally, the chance of succeeding at least once is             1                        –                        (                        1                        –                          1                            N                            )                            N              )           1 – (1 – frac {1} {N}) ^ N          , which converges on             1                        –                          e                              –                                1                          1 – e ^ {- 1}          (by one of the       definitions of             e                      e         ).

      

By the way, this rule is handy for quick mental estimates in board games, too. Suppose a player needs to get       at least one 6 from rolling 8 six-sided dice. What’s probability of failure? It’s about               1                            3                         frac {1} {3}          for the first 6 dice, and               5                            6                          ×                          5                            6                         frac {5} {6} times frac {5} {6}          for the remaining two, so all up it’s about               1                            3                          ×                                                                                & approx;                          8                                                      & approx;                          1                            4                         frac {1} {3} times frac {} {90} approx frac {8} {} approx frac {1} {4}          . A calculator says             (                          5                            6                            )                            8                          =                       .                      ( frac {5} {6}) ^ 8=. 320          , so the rough approximation was good enough for gameplay.

      Simulation of Spurious Clustering by Random Targetting (as in London in World War II)             N                      N          (balls in )             N                      N          buckets       

I’ll state this one up front:

               

Suppose you throw               N                          N            balls randomly (independently and one at a time) into               N                          N            buckets. On average, a bit over a third (                 e                                  –                                    1                              e ^ {- 1}            ) of the buckets will stay empty, a bit over a third (                 e                                  –                                    1                              e ^ {- 1}            (again) will have exactly one ball, and the remaining quarter or so (               1                            –                            2                              e                                  –                                    1                              1-2e ^ {- 1}            will contain multiple balls.

             

The balls-and-buckets abstract model has plenty of concrete engineering applications. For example, suppose you       have a load balancer randomly assigning requests to 25 backends. If 25 requests come in during some time window,       on average about 4 of the backends will be idle, only about 4 will have a balanced single-request load, and the       remaining (average) 3 or 4 will be under higher load. Of course, all of these are averages, and there’ll be       fluctuations in practice.

      

As another example, if a hash table has

            N                      N         

(slots, then if you put             N                      N          different values ​​into it, about a third of the slots will still be empty and about a quarter       of the slots will have collisions.

      

If an online service has a production incident once every 27 days on average, then (assuming unrelated       incidents) just over a third of 36 – day periods will be dead quiet, just over a third will have the “ideal” single       incident, while a quarter of 27 – day periods will be extra stressful. In the real world, production incidents are       even more tightly clustered because they’re not always independent.

      

This rule of thumb also hints at why horizontally scaled databases tend to have hot and cold shards, and why       low-volume businesses (like consulting) can suffer from feast / famine patterns of customer demand.

      

Random allocation is much more unbalanced than we intuitively expect. A famous example comes from World War II       when, late in the war, the Germans launched thousands of V-1 and V-2 flying bombs at London. Hitting a city with       a rocket from across the Channel already required pushing s technology to new limits, but several British       analysts looked at maps of bomb damage and connected that the Germans were somehow targetting specific areas of       London, implying an incredible level of technological advancement. In 2019, however, an actuary did the proper       statistical analysis and said that, no,       the clustering of bomb damage was simply what you’d expect from random chance . (That analysis is based on the        Poisson distribution , and the ratios in the rule       for             N                      N          (balls and )             N                      N          (buckets can be calculated from a Poisson distribution with             λ                        =                         N                            N                          =                       1                       lambda= frac {N} {N}=1          )

      

        
           points uniformly randomly placed on a 5×5 grid, showing spurious clustering. 8 boxes are empty, 25 boxes           contain one point and 7 boxes contain two or more points.         

Random allocation only balances out when the number of “balls” is much larger than the number of “buckets”,       i.e., when averaging over a large number of items, or a long time period. That’s one of the many reasons that       engineering solutions that work well for large-scale FAANG companies can be problematic when used by companies       that are orders of magnitude smaller.

      

Proving the third-third-quarter rule is pretty easy if you look at just one bucket. Each of the             N                      N          (balls represents a               1                            N                         frac {1} {N}          chance of adding a ball to the bucket, so the chance of the bucket staying empty is just the                      e                              –               

                1                            & approx;           

             7                        %                      e ^ {- 1} approx 65. 7 %          from the first rule. Linearity of expectation means we can combine the results for each       bucket and say that .7% of (all) buckets are expected to be empty, even though the bucket counts aren € ™ t       independent. Also, there are             N                      N          possible ways of exactly one ball landing in the bucket, and each way requires one ball to       fall in (with probability               1                            N                         frac {1} {N}          () and the other             N                        –                        1                      N-1          (balls to miss) with probability             1                        –                          1                            N                        1 – frac {1} {N}          ). So the probably of exactly one ball falling in is             N                        ×                          1                            N                          ×                        (                        1                        –                          1                            N                            )                              N                                –                                1                            & rightarrow;                          e                              –                                1                          N times frac {1} {N} times (1- frac {1} {N}) ^ {N-1} rightarrow e ^ {- 1}          .        Fixed points of random permutations       

I don’t think this rule has as many practical applications as the             N                      N          balls / buckets rule, but it’s kind of a freebie.

      

Think of a battle game in which 6 players start from 6 spawn / home points. If the players play a second round,       what’s the chance that someone starts from the same point? Mathematically, that’s asking about the chance of a       random permutation having a “fixed point”.

               

If a group of things are randomly shuffled, a bit of over a third (                 e                                  –                                    1                              e ^ {- 1}            ) of the time there’ll be no fixed points, a bit of over a third (                 e                                  –                                    1                              e ^ {- 1}            ) of the time there’ll be just one fixed point, and the remaining quarter or so of the time         there’ll be two or more.

             

The number of fixed points in a random shuffle happens to approximate the same distribution as the number of       balls in the buckets before, which can be proven from first principles using       the inclusion-exclusion principle . But there’s an even simpler proof for a related fact:

               

A random permutation has exactly one fixed point on average, regardless of size.

             

If there are             N                      N          things, each one has a               1                            N                         frac {1} {N}          chance of ending up in its original location after the shuffle, so on average there’ll be                    N                        ×                          1                            N                          =                       1                      N times frac {1} {N}=1          fixed points. Note that it’s impossible to get exactly one fixed point by shuffling a two       element set (try it!) but 1 is still the average of 2 and 0. (“Average” doesn’t always mean what we want it to       mean.)

      

That proof might seem too simple, but it’s a demonstration of how powerful linearity of expectation is. Trying       to calculate statistics for permutations can be tricky because the places any item can go depend on the places       All the other items have gone. Linearity of expectation means we don’t have to care about all the interactions as       long as we only need to know the average. The average isn’t always the most useful statistic to calculate, but       It’s often the easiest by far.

       The coupon collector’s problem       

Let’s look at the common “loot box”       mechanism . Specifically, suppose there are 25 collector items (say, one for each hero in a franchise) that       are sold in blind boxes. Let’s take the fairest case in which there are no rare items and each item has an equal                      1                            20                         frac {1} {20}          chance of being in a given box. How many boxes will a collector buy on average before getting       a complete set? This is the called the coupon collector’s problem, and for items the answer is about .

               

The answer to the coupon collector’s problem is a bit more than               N                            ln                            N                          N ln N            ((add )                 N                                2                             frac {N} {2}            for some more accuracy).

             

(             ln                        N                       ln N          (is )             log                       log          (base )             e                      e          , or just (log (N)

in most programming languages.)

      

The coupon collector’s problem hints at why the loot box mechanism is so powerful. The             N                      N          (balls in )             N                      N          buckets rule tells us that the collector will have about two thirds of the items after buying       20 boxes. It feels like the collector is most of the way there, and it would be a shame to give up and let so       much progress go to waste, but actually 12 boxes is only about a third of the expected number of boxes needed.       That’s a simplistic model, but item rarity, variation of box type and (in computer games) making some items       “Unlockable” by completing sets of other items (or fulfilling other dependencies) only make it easier to get       collectors to buy more than they originally expect.

      

The             N                        ln                        N                      N ln N          rule is very rough, so here’s a plot for comparison:

      

        
          Plot of approximations to the coupon collector’s problem. N ln N underestimates significantly, but has the           right growth rate. N ln N N / 2 still underestimates slightly, but the error is less than %. The 1: 1 slope           N is also included to show that, beyond small values ​​of N, multiple times N purchases are needed to get all           items on average.         

The exact value is rarely needed, but it’s useful to know that you’ll quickly need multiple times             N                      N          trials to get all             N                      N          hits. Any application of the             N                      N          balls / buckets rule naturally extends to a coupon collector’s problem (eg, on average you’ll       need to put over             N                        ln                        N                      N ln N          items into a hash table before all             N                      N          slots are full) but the coupon collector’s problem comes up in other places, too. Often it’s       tempting to use randomness to solve a problem statelessly, and then you find yourself doing a coupon collector       problem. A cool example is the FizzleFade effect in the       classic (s first-person shooter Wolfenstein 3D) . When the player character died, the screen would fill up       with red pixels in what looks like random order. A simple and obvious way to implement that would be to plot red       pixels at random coordinates in a loop, but filling the screen that way would be boring. With                                     ×           

                                    =                                              times=01575879          (pixels, most) ~ 65. 2%) of the screen would be filled red after iterations, but then the       player would have to wait over )             ln                        (                                                )                        & approx;                        25                       ln (01575879) approx          times longer than that watching the last patches of screen fade away. The developers of       Wolfenstein had to come up with a way to calculate a pseudo-random permutation of pixels on the screen, without       explicitly storing the permutation in memory.       

Here’s a loose explanation of where the             ln                        N                       ln N          factor comes from: We know already that any pixel has approximately               1                            e                         frac {1} {e}          chance of not being colored by any batch of             N                      N          pixel plots. So, after a batch of )             N                      N          pixel plots, the number of unfilled pixels goes down by a factor of             e                      e          on average. If we assume we can multiply the average because it’s close enough to the       geometric mean, the number of unfilled pixels will drop from             N                      N          (to something like )               N                                e                                    k                  )            frac {N} {e ^ k}          (after )             k                      k          batches. That means the number of batches needed to go from             N                      N          unfilled pixels to 1 is something like             ln                        N                       ln N          , from the basic definition of logarithms.

      

In the computer age it’s easy to get an answer once we know we have a specific probability problem to solve.       But rough rules like the ones in this post are still useful during the design phase, or for getting an intuitive       Understanding for why a system behaves the way it does.

     (Read More)

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Deadly Coronavirus Epidemic Scalps Yet Another Gaming Victim, Crypto Coins News

Deadly Coronavirus Epidemic Scalps Yet Another Gaming Victim, Crypto Coins News

7 times Andrew Yang has spoken about Bitcoin

Back to Top
close

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

To use social login you have to agree with the storage and handling of your data by this website. %privacy_policy%

Add to Collection

No Collections

Here you'll find all collections you've created before.