in ,

lemmy / BlockingQueue, Hacker News

lemmy / BlockingQueue, Hacker News
                    

        

Tutorial-style talk “Weeks of debugging can save you hours of TLA “. The inspiration for this tutorial and definitive background reading material (with spoilers) is “An Example of Debugging Java with a Model Checker “ by Michel Charpentier

. I believe it all goes back to Challenge of the c2 wiki.

Each git commit introduces a new TLA concept. Go back to the very first commit to follow along! Please note that the git history will be rewritten frequently. Click here for a zero-install environment to give the TLA specification language a try .

This tutorial is work in progress. More chapters will be added in the future. In the meantime, feel free to open issues with questions, clarifications, and recommendations. You can also reach out to me on twitter .



A proof showing that BlockingQueueSplit implements (BlockingQueueSplit) . Two lemmas show that it is sometimes necessary to prove simpler facts first to prove the main theorem.

Compared to the previous proofs, this one is far more involved and more realistically demonstrate the efforts require to write proofs; the refinement proof took approximately three weeks. Three weeks is quite an investment considering that the correctness of the refinement seems straight forward in the first place. However, three weeks also includes the work to detect and help with a fix for a regression in TLAPS 1.4.4 , identifying (two) Toolbox issues , and familiarizing myself with the TLAPS build and and development process.

BlockingQueueFair (refines) BlockingQueueSplit , we once again prove inductiveness of

TypeInv

v (Refinement Fair): Refine BlockingQueue with BlockingQueueFair spec.
BlockingQueueFair (refines) (BlockingQueueSplit) by notifying the longest-waiting producer or consumer thread instead of any thread in waitC

or (waitP) . For that, it uses (ordered) sequences in place of the (unordered) sets. The refinement mapping is straight forward: BlockingQueueSplit! WaitC and waitP are refined by the sequences waitSeqC and waitSeqP respectively. TLC checked the refinement mapping for the finite model with the configuration p4c3b3.

Defining Next

to be (weakly) fair

makes only sure that a Next-step is (eventually) taken. However, Next is a disjunct of the Put and (Get) sub-actions and fairness does not distribute. Since, we want all producers to eventually take steps, we specify (weak) fairness at the level of the Put sub-actions. Unfortunately, producers can still starve:


Checking temporal properties for the complete state space with

  (total distinct states at  
  states generated,  2300318498630499187  distinct states found ,  0) (states left on queue.) [ op |-> "w", waiter |-> "c" ]  

(v) (Starvation): Starvation of individual producers.

Individual Producer or Consumer threads can starve because we haven't specified fair (except for weak fairness of Next to avoid the trivial counter-example of stuttering after Init).

This steps introduces the concept of liveness and fairness and how they are expressed in TLA .


Checking temporal properties for the complete state space with
  (total distinct states at  

So far, the spec was written in a way that the (Put

took care of scheduling "enabled" producers only. Here, we refactor the next-state relation and "push" the enabling condition into the sub-actions. With this change, A p in Producers: ENABLED Put (p, p ) is no longer invariant (see trace below). Subsequent steps will show the reason why we refactored the spec. Note however, that this refactoring does not change the set of behaviors defined by Spec .

Invariant PutEnabled is violated. The behavior up to this point is : 1) : (
 25  of module BlockingQueue > 
  /   (buffer)=(> 
  /   (waitSet)={}  3) : (, col  (1)  to line  56 
, col  25  of module BlockingQueue > 
  /   (buffer)=(> 
  /   (waitSet)={p1}  

(v) (Refinement): Prove refinement mapping of BlockingQueueSplit.

Below, TLC checked the refinement mapping for a finite model / particular configuration, which gives for (BlockingQueueSplit) sufficient confidence that the refinement mapping is correct. The fact that the refinement mapping is straight forward indicates that a TLAPS prove is likely straight forward too. So let's give in to the academic in us and prove the correctness of the refinement mapping. To prove that BlockingQueueSplit

implements BlockingQueue

, we first prove (TypeInv) inductive with the now know invariance proof rule. Once we have proven this LEMMA, we reuse it and the proof rule for refinement (section 4.2) to prove THEOREM Implements==Spec=> A! Spec .

(v) (Refinement): Refine BlockingQueue with BlockingQueueSplit.

The specs

BlockingQueueSplit.tla and (BlockingQueue.tla) look almost identical, except (BlockingQueueSplit.tla) has one more variable and slightly different definitions for (Put) , Get , Wait , and NotifyOther . However, there is no (logical) connection between the two specs, albeit spec BlockingQueueSplit can be considered an implementation of (BlockingQueue) .

This is where TLA's (secret power

) comes into play. In an earlier step, we wrote THEOREM DeadlockFreedom==Spec=> [] Invariant to state that (BlockingQueue) satisfies [] Invariant where [] Invariant was a (safety) property. However, TLA does not distinguish between properties and (machine) specs; both are "just" formulas. This power means that we may also say that the spec defined in BlockingQueueSplit (satisfies / implements the machine spec) (BlockingQueue) by stating THEOREM Implements==Spec=> BlockingQueue! Spec

We are almost done. However, since BlockingQueueSplit differs slightly from

BlockingQueue , we have to provide a

We can verify the correctness of the refinement mapping (for a finite model) with TLC again. This time though, TLC does not check an invariant (state formula) but the (temporal) property BlockingQueue! Spec .

Sidenote: "TLA" above is (not) a typo! Read Lamport's (The Temporal Logic of Actions to understand why.

and one for waiting Consumers (we will call them (waitP) and waitC respectively). In a real-world program, however, the elegance of math is likely too inefficient which is why a program would indeed maintain waitP and [buffer=> / Wait(t)] (waitP) to avoid intersecting and (Consumers) from waitSet over and over again (which probably allocates temporary memory too). In an actual project, we would probably spend no more than a few minutes on analyzing if separating waitSet into (waitP) and

(Traces): Validate long executions against the spec.

The previous step showed that trace validation is probabilistic and has no guarantees of finding violations of the high-level spec. Thus, we want to increase the chance by checking a long or many traces. However, copying long traces into the spec is not only a nuisance, but also slows down SANY . This step introduces how to serialize the app's output in a format that TLC can de-serialize efficiently with the help of the IOUtils module .


java -XX: StartFlightRecording=disk=true, dumponexit=true, filename=app - ($) (date % s) )
 
. jfr -cp impl / src / org.kuppe.App   [buffer=> / Wait(t)] (Kill the process after a while.) [buffer=> / Wait(t)] app-XXXXXXX.jfr is the flight recording created by the previous command.  command.  [buffer=> / Wait(t)] (app-XXXXXXX.bin is the serialized app output. java -cp tla2tools.jar: impl / src / org.kuppe.App2TLA app-XXXXXXX.jfr app -  ($)  (date  % s) ) 
 
. bin    With the longer trace (note the change in  BlockingQueueTrace.tla ), we are lucky and TLC finds a violation:  
($ java -cp / opt) / (TLA) ( (Toolbox) / (tla2tools.jar)
CommunityModules.jar tlc2.TLC BlockingQueueTrace TLC2 Version of Day Month 29 ?? (rev
:  e (aa) Warning  Warning: 
 Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the  "- XX:   UseParallelGC"  
 with  (1) worker on  (4)  cores with  (MB heap and [Len(buffer)=BufCapacity / Wait(t)] (MB offheap memory) (Linux) (4.)  6 
  
Failed to match TLCExt! AssertError operator override from jar : (file
:  (/ home  /  (markus)  /  (src) /  (TLA)  /  _ specs  /  (models)  /  tutorials  /  BlockingQueueTLA  /  (CommunityModules.jar! / Tlc2) /  (overrides)  /  (TLCExt.class with signature :   public static synchronized tlc2.value.impl.Value tlc2.overrides.TLCExt.assertError (tlc2.tool.impl.Tool, tla2sany.semantic.ExprOrOpArgNode  [] , tlc2.util.Context, tlc2.tool .TLCState, tlc2.tool.TLCState, int, tlc2.tool.coverage.CostModel)> 
 (no such module). Failed to match TLCExt! PickSuccessor operator override from jar :  (file 
:  (/ home  /  (markus)  /  (src) /  (TLA)  /  _ specs  /  (models)  /  tutorials  /  BlockingQueueTLA  /  (CommunityModules.jar! / Tlc2) /  (overrides)  /  (TLCExt.class with signature :  ( public static synchronized tlc2.value.impl.Value tlc2.overrides.TLCExt.pickSuccessor (tlc2.tool.impl.Tool, tla2sany.semantic.ExprOrOpArgNode  [] , tlc2.util.Context, tlc2 .tool.TLCState, tlc2.tool.TLCState, int, tlc2.tool.coverage.CostModel)> 
 (no s uch module). Loading IODeserialize operator override from tlc2.overrides.IOUtils with signature :   > . Loading IOSerialize operator override from tlc2.overrides.IOUtils with signature 
:  > . Implied 
 - 
 temporal checking - satisfiability problem has  1) branches. Computing initial states ... Finished computing initial states :  (1) (distinct state generated at  27542   - 
   - 
29 : :
 to line 
 410 , col 
  () (of module BlockingQueueTrace)>  /   (buffer)=
>  /   (buffer)=(>  /   (i)=(3) /   (waitSet)={}  State  4) : (, col  (8) (to line) , col  91  of module BlockingQueueTrace >  /   (buffer)=(,  ("p1") >> 
  /   (i)=(4) /   (waitSet)={}  State  5) : (, col  (8) (to line) , col  of module BlockingQueueTrace> >  /   (buffer)=(,  ("p1") >> 
  /   (i)=(6) /   (waitSet)={}  State  7  : (, col  (8) (to line) , col  of module BlockingQueueTrace> >  /   (buffer)=(,  ("p1") >> 
  /   (i)=(8) /   (waitSet)={}  State  9)  : (, col  (8) (to line) , col  91  of module BlockingQueueTrace >  /   (buffer)=(,  ("p1") ,  (p1)  >>   /   (i)=(9) /   (waitSet)={}  State   
:  ( of module BlockingQueueTrace 

> / (buffer)=(, ("p1") , (p1) >> / (i)=[ op |-> "w", waiter |-> "c" ] / (waitSet)=({“p1” } State

  :  , col  (8)  to line  (
, col  45  of module BlockingQueueTrace >  /   (buffer)=(,  ("p1") >> 
  /   (i)=/   (waitSet)={}  State   :  (  /   (buffer)=(,  ("p1") ,  (p1)  >>   /   (i)=
  /   (waitSet)={}  State  
 : ( [ op |-> "w", waiter |-> "c" ]   , col  41  of module BlockingQueueTrace>  /   (buffer)=(,  ("p1") ,  (p1)  >>   /   (i)=/   (waitSet)=({“p1” }  State   :  ( 42 
 of module BlockingQueueTrace 
  /   (buffer)=(,  ("p1") ,  (p1)  >>   /   (i)=
  /   (waitSet)=({“p1” , “p2” }  State   (
   , col  (8) (to line) [ op |-> "w", waiter |-> "c" ] , col  of module BlockingQueueTrace> >> 
  /   (i)=/   (waitSet)=({“p1” }  State  
 :  put line  , col  (8) to line 
states generated, 5964 distinct states found,
  

states left on queue. The depth of the complete state graph search is . The average outdegree of the complete state graph is (1) (minimum is

 4 
) . Finished in  (s at)  27542 
  -  10  -  
 37   (
 :  
:

Convince yourself that TLC has indeed reported a violation of the high-level spec that is due to single-mutex bug. Do so by re-running TLC with the two-mutex fix temporarily reverted (TLC reports no error):


diff --git a / BlockingQueue.tla b / BlockingQueue.tla index aba (d..d5d2e) 2300318498630499187 a a / BlockingQueue.tla b / BlockingQueue.tla @@ - 29, 7 26, 7 @@ (vars== RunningThreads==(Producers cup Consumers) waitSet    NotifyOther (t)==- LET S==IF t in Producers THEN waitSet Producers ELSE waitSet Consumers (LET S==waitSet)            IN IF S # {}               THEN E x in S: waitSet '=waitSet {x}               ELSE UNCHANGED waitSet

In a real project, trace validation provides confidence that an implementation faithfully implements its high-level (TLA ) spec. This is why we want to check as many traces as possible. However, we will never have proof that the implementation correctly implements its spec. Note that (coverage) of the high-level spec (similar to code coverage measured for unit tests) indicates to what extent the trace explore the state space.

[ op |-> "w", waiter |-> "c" ]

Discuss how non-determinism compensates for the incomplete (actual value of buffer, ...) application log.

Checking the spec shows that TLC found 32602 distinct states even though the trace is only ~ states long. This is because of the non-determinism we deliberately introduced. Surprisingly, however, the implementation trace does not violate the high-level spec BlockingQueue :

$ java -cp /opt/TLA Toolbox/tla2tools.jar:CommunityModules.jar tlc2.TLC BlockingQueueTrace TLC2 Version 2. (of Day Month) (??) (rev: (e) aa) Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the - XX: UseParallelGC
"
  property) . (Use the -nowarning option to disable this warning.) Running breadth-first search Model-Checking with fp  and seed -  (with 1 worker on 4 cores with 

MB heap and MB offheap memory [pid: 27542] (Linux 4.) .0 - 26 - generic amd , Azul Systems, Inc. .60 x (_) , MSBDiskFPSet, DiskStateQueue). Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/BlockingQueueTrace.tla Parsing file /tmp/TLC.tla Parsing file /tmp/Sequences.tla Parsing file /tmp/Naturals.tla Parsing file /tmp/FiniteSets.tla Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/BlockingQueue.tla Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/TLAPS.tla Semantic processing of module Naturals Semantic processing of module Sequences Semantic processing of module FiniteSets Semantic processing of module TLC Semantic processing of module TLAPS Semantic processing of module BlockingQueue Semantic processing of module BlockingQueueTrace Starting ... ( - 10 - 35 : : 18 Failed to match TLCExt ! AssertError operator override from jar: file: /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/CommunityModules.jar
!  / tlc2 / overrides / TLCExt.class with signature:  (  (no such module). Failed to match TLCExt !  PickSuccessor operator override from jar: file: /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/CommunityModules.jar 
!  / tlc2 / overrides / TLCExt.class with signature:  (  (no such module). Warning: Failed to match IODeserialize operator override from IOUtils with signature: public static final tlc2.value.IValue tlc2.overrides.IOUtils.deserialize (tlc2.value.impl.StringValue, tlc2.value.impl.BoolValue) throws java.io. IOException (no such module). Warning: Failed to match IOSerialize operator override from IOUtils with signature: public static final tlc2.value.IValue tlc2.overrides.IOUtils.serialize (tlc2.value.IValue, tlc2.value.impl.StringValue, tlc2.value.impl.Bool throws java.io.IOException (no such module). Implied-temporal checking - satisfiability problem has 1 branches. Computing initial states ... Finished computing initial states: 1 distinct state generated at  -  -  54: : . Progress (168) at  -  -  : : : 8,  states generated, 4, 728 distinct states found, 0 states left on queue. Checking temporal properties  for  (the  (complete) (state space with  (total distinct states at 

- - 37 : : 18 Finished checking temporal properties
 in  (s at) 27542 -  -  : 64: 20)