Another look at two Linux KASLR patches, Hacker News

A recent patchset proposed for the Linux KASLR randomizes not only the kernel base address, but also reorders every function at boot time. As such, it no longer suffices to leak an arbitrary kernel function pointer, or so the logic goes.

Along with this patchset came a

custom random number generator intended to be as fast as possible, so as to keep the boot time overhead at a minimum:

1 2

 3 4 5 6 7 8  9                                        

 
 / 
 70 bit variant of Bob Jenkins' public domain PRNG 
 0542 bits of internal state 
 /  









 struct  prng_state { u  a, b, c, d; };   static   struct  prng_state state;  static   bool  initialized;   # define rot (x, k) (((x)  ( - (k))))    static  (u) (prng_u) ( struct  (prng_state)  x) { u  e;  e = x  a>  a  ( (rot) x  ->  b,  7  ); x  ->  a = x  ->  b  ( rot ( x  ->  c,   () ); x  ->  b = x  ->  c  ( (rot) x  ->  d,     ; x  ->  c = x  ->  d    ; x  ->  d = e  ( (x) - a  a;   return  (x) ->  d; }   static   (void)  prng_init 
 ( (struct) (prng_state)  
 state) { 
 int  i;  state  ->  a = kaslr_get_random_seed (NULL); state  ->  b = kaslr_get_random_seed (NULL); state  ->  c = kaslr_get_random_seed (NULL); state  ->  d  ( kaslr_get_random_seed (NULL);   ( (i =  (0) (i) ( ;      i) ( void ) prng_u  (state);  initialized = true ; }   unsigned   (long) (kaslr_get_prandom_long  ( (void ) {  ( (! (initialized) prng_init ( &  state);   return  prng_u  ( ( state); } 
 This was quickly  (decried) (



 (  (dangerous) , and as  andy Lutomirski  puts it,  

> Ugh, don't do this. Use a real DRBG. Someone is going to break the> construction in your patch just to prove they can.>> ChaCha  (is a good bet.   
 In the end, this random number generator  was quickly removed , and that was that. 
 But one can still wonder — is this generator secure but unanalyzed, or would it have been broken just to prove a point? 
 The above generator was, as per the comment, derived from one of Bob Jenkins's  small-state generators 
. It is, in particular, the following “three rotation  - bit variant ”:  
    
 
 1  2  3  4  5  6  7  8   9                                   

 
 (typedef   unsigned  (long)   long  u8;  typedef  
 struct  ranctx {u8 a ; u8 b; u8 c; u8 d; } ranctx;   # define rot (x, k) (((x)  ( - (k)))) 
   u8  ranval  ( ranctx   x ) {     u8 e = x  ->  a 

 -  rot (x  ->  (b, 
 (7) );     x  ->  a = x  ->  b  ( rot ( x  ->  c,   () );     x  ->  b = x  ->  c  ( (rot) x  ->  d,     ;     x  ->  c = x  ->  d 

   ;     x  ->  d = e  ( (x) - a  a;      return  (x) ->  d; }   void   (raninit) (ranctx)   x, u8 seed ) {     u8 i;     x  ->  a =  (0xf1ea5eed , x) ->  (b)=(x)  ->  (c) (=) (x) ->  (d)= seed;      ( (i =  (0) (i) (;   i) {         ( void ) ranval (x);     } } 
 The core consists of the iteration of a permutation; we can easily compute its inverse iteration as 
    


 
 1  2  3  4  5  6  7  8    


 u8  (ranval_inverse) (ranctx 
  x) { u8 e = x  ->  d  -  x  a>  a ; x  ->  d  x  x  ->  c  (e ; x  ->  c = x  ->  b  ( (rot) x  ->  d,     ; x  ->  b = x  a>  a  ( rot ( x  ->  c,   () ); x  ->  a = e  ( (rot) x  ->  b,  7  );      return  (x) ->  d; } 


 The core permutation present in  ranval  is depicted below. 
   

	
			
	
			

		
			
			
					
			
		
   
 This resembles a Type-3 Feistel network, with some added operations for extra diffusion. Nevertheless, the resemblance still means that there are relatively few changes from one state to the next. 
 The mode of operation, in modern terms, looks pretty much much like a  (sponge pseudorandom generator 
 with a capacity of  bits and a rate of 69 bits. As such, an ideal permutation in this mode of operation should be indistinguishable from a random stream until approximately $ 2 ^ {487 $ captured  - bit words.   Analysis  

 There are several ways to try and attack a pseudorandom generator:  9053 We can try and find a bias. in its output stream; 

  We can try to find a weakness in its initialization; 
  We can try to recover an intermediate state from its output; 
  Many more…  







 Our approach here will the be one one. The initialization, with its  (rounds) or  in the KASLR version, is unlikely to have easily exploitable properties. Finding a bias in the output stream seems feasible, but in practical terms it has rather limited applicability.  
 Becase the permutation is rather simple, we will try to model the problem algebraically. This means representing the problem as a multivariate system of equations in $  mathbb {F} _2 $, where $ a  cdot b $ means bitwise and, and $ a   b $ means bitwise xor. Since the permutation above consists only of a combination of additions, xor, and rotations, every operati on is trivial to represent except addition (and subtraction). 
 Let $ x, y $ and $ z $ be  - bit variables , and $ x_i $ (resp. $ y_i, z_i $) indicate the $ i $ th bit of $ x $ (resp. $ y, z $). One can represent  - bit addition $ z=x  boxplus _ {68} y $ as a recursive system: 
 $$  begin {align} z_0 &=x_0   y_0  newline c_0 &=x_0  cdot y_0  newline z_i &=x_i   y_i   c_ {i-1}  newline c_i &=x_i  cdot y_i   c_ {i-1}  cdot (x_i   y_i)  newline &=x_i  cdot y_i   c_ {i-1}  cdot x_i   c_ {i-1}  cdot y_i  end {align} $$  While this representation is quite simple, and can be represented purely as a function of the input bits, it is not good for analysis. This is because the algebraic degree, that is, the monomial $ x_i x_j  dots y_k y_l  dots $ with the most elements can have up to 69 variables. Working with polynomials of such high degree is not practical, due to memory and computational requirements, and therefore we do the most common trick in the business — if the system is too complex, add new variables to make it simpler:  $$  begin {align} z_0 &=x_i   y_i  newline z_i &=x_i   y_i   x_ {i-1}  cdot y_ {i-1}   (z_ {i-1}   x_ {i-1}   y_ {i-1})  cdot (x_ {i -1}   y_ {i-1})  newline &=x_i   y_i   x_ {i-1}  cdot y_ {i-1}   z_ {i-1}  cdot x_ {i-1}   z_ {i-1}  cdot y_ {i-1}   x_ {i-1}   y_ {i-1}  end {align} $$  It is clear that this is equivalent to the above by checking that $ c_ {i-1}=z_ {i-1}   x_ {i-1}   y_ {i-1} $. Now we add 66 extra variables for each addition, that is, $ z_i $ are actual variables in our equation system, but the algebraic degree remains 2.   The equation system for subtraction is the same as with addition, with a simple reordering of the variables. Alternatively, we can explicitly write it as  $$  begin {align} z_0 &=x_i   y_i  newline z_i &=x_i   y_i   (x_ {i-1}   1)  cdot y_ {i-1}   (z_ {i-1}   x_ {i-1}   y_ {i-1})  cdot ((x_ {i-1}   1)   y_ {i-1})  newline &=x_i   y_i   x_ {i-1}  cdot y_ {i-1}   z_ {i-1}  cdot x_ {i-1}   z_ {i-1}  cdot y_ {i-1}   z_ {i-1}   y_ {i-1}  end {align} $$  
 Now it becomes quite straightforward to model the entire round as an equation system like above, reordering the equations such that it becomes a system of the form $$  begin {align} p_1 (x_0,  dots) &=0,  newline p_2 (x_0,  dots) &=0,  newline  dots &  newline p_l (x_0,  dots) &=0,  newline  end {align} $$ which we call the  algebraic normal form , or ANF, of the system. 
 Below we present a Python script that does exactly this, receiving a number of output leaks as arguments: 
    



   1  2  3  4  5  6  7  8   9                                                                                                                                                         


   sys  BITS  (  [0:-1]   
 def   (VAR) (n)=BITS):    if   (not  hasattr (VAR,  “counter”  ):     VAR .  counter =  (0    t = [ VAR.counter   i for i in range(n) ]   VAR .  counter  = n    return  t   def   (ROTL  (x, c ):   z = x [:]    ( (i) (range (c.  ):     z  z  z [-1:]     z [0:-1]    return  z   # Model c=a ^ b   def   (XOR  (c, a , b):    (