Tutorial-style talk “Weeks of debugging can save you hours of TLA “. The inspiration for this tutorial and definitive background reading material (with spoilers) is “An Example of Debugging Java with a Model Checker “ by Michel Charpentier
This tutorial is work in progress. More chapters will be added in the future. In the meantime, feel free to open issues with questions, clarifications, and recommendations. You can also reach out to me on twitter .
TypeInv
v (Refinement Fair): Refine BlockingQueue with BlockingQueueFair spec.
BlockingQueueFair (refines) (BlockingQueueSplit) by notifying the longest-waiting producer or consumer thread instead of any thread in waitC
or (waitP) . For that, it uses (ordered) sequences in place of the (unordered) sets. The refinement mapping is straight forward: BlockingQueueSplit! WaitC and waitP are refined by the sequences waitSeqC and waitSeqP respectively. TLC checked the refinement mapping for the finite model with the configuration p4c3b3.
v (Starvation): Weak fairness defined for Put.
Defining Next
Defining Next
to be (weakly) fair
makes only sure that a Next-step is (eventually) taken. However, Next is a disjunct of the Put and (Get) sub-actions and fairness does not distribute. Since, we want all producers to eventually take steps, we specify (weak) fairness at the level of the Put
sub-actions. Unfortunately, producers can still starve:
Checking temporal properties for the complete state space with
(total distinct states at -13 - : : ) Error : Temporal properties were violated. Error : The following behavior constitutes a counter - (example) : State 1 : ( / (buffer)=>/ (waitSet)={} State 2) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(>/ (waitSet)={} State 3 : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={} State 4) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={} State 5) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={p1} State 6) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={p1, p4} State 7 : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={p1, p2, p4} State 8 : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={p1, p2, p3, p4} State 9) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(> / (waitSet)={p1, p2, p3} State
: (, col (of module BlockingQueue)> / (buffer)=(>/ (waitSet)={p1, p2} State: , colof module BlockingQueue> / (buffer)=>/ (waitSet)={p1} State : (, col (9) to line 80
, col (of module BlockingQueue)> / (buffer)=(>/ (waitSet)={p1} State: ( [ op |-> "w", waiter |-> "c" ] , col (9) to line 75, colof module BlockingQueue>/ (buffer)=(> / (waitSet)={p1} Back to state (5) [ op |-> "w", waiter |-> "c" ] : , col 9to line 83 , col of module BlockingQueue> Finished checking temporal properties in (s at) 12313 - 11 () - : : states generated, 2300318498630499187 distinct states found , 0) (states left on queue.) [ op |-> "w", waiter |-> "c" ]
(v) (Starvation): Starvation of individual producers.
Individual Producer or Consumer threads can starve because we haven't specified fair (except for weak fairness of Next to avoid the trivial counter-example of stuttering after Init).
This steps introduces the concept of liveness and fairness and how they are expressed in TLA .
Checking temporal properties for the complete state space with(total distinct states at -13 - : 15 :Error : Temporal properties were violated. Error : The following behavior constitutes a counter - (example) : State 1 : ( / (buffer)=>/ (waitSet)={} State 2) : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=>/ (waitSet)={c1} State 3 : ( , col 9) (to line) , col of module BlockingQueue>/ (buffer)=(>/ (waitSet)={} Back to state (1) [ op |-> "w", waiter |-> "c" ] : , col 9to line 83 , col of module BlockingQueue>took care of scheduling "enabled" producers only. Here, we refactor the next-state relation and "push" the enabling condition into the sub-actions. With this change, A p in Producers: ENABLED Put (p, p ) is no longer invariant (see trace below). Subsequent steps will show the reason why we refactored the spec. Note however, that this refactoring does not change the set of behaviors defined by Spec .
So far, the spec was written in a way that the (Put (v) (Starvation): Refactor specification to move action enabling condition into Put and Get. and (Get) (sub-actions of) (Next were permanently enabled, ie A p in Producers: ENABLED Put (p, p) was an invariant of (Spec) (vice versa for Consumers). The next-state relation E t in RunningThreads: t in Producers / ...
Invariant PutEnabled is violated. The behavior up to this point is : 1) : (> / (buffer)=> / (waitSet)={} 2) : (, col (1) to line 56, col 25 of module BlockingQueue >/ (buffer)=(>/ (waitSet)={} 3) : (, col (1) to line 56, col 25 of module BlockingQueue >/ (buffer)=(>/ (waitSet)={p1}
java.util.concurrent.locks.ReentrantLock and java.util. concurrent.locks.Condition . Executing the new program for a couple of hours with configuration p2c1b1 reveals no deadlock (the broken version of the program deadlocked within seconds). This, and the fact that Java's own (v) (Refinement): Implement BlockingQueueSplit spec in Java and C. Knowing that BlockingQueueSplit refines (BlockingQueue) and thus is deadlock-free, we shift our attention to the Java (and C) program. Instead of Java's synchronized statement, we implement BlockingQueueSplit with the help of the low-level synchronization primitive ArrayBlockingQueue has a similar implementation, should silence our concerns to put the program into production. [] Invariant where, we first prove (TypeInv) inductive with the now know invariance proof rule. Once we have proven this LEMMA, we reuse it and the proof rule for refinement (section 4.2) to prove THEOREM Implements==Spec=> A! Spec .
(v) (Refinement): Prove refinement mapping of BlockingQueueSplit.
implements BlockingQueueBelow, TLC checked the refinement mapping for a finite model / particular configuration, which gives for (BlockingQueueSplit) sufficient confidence that the refinement mapping is correct. The fact that the refinement mapping is straight forward indicates that a TLAPS prove is likely straight forward too. So let's give in to the academic in us and prove the correctness of the refinement mapping. To prove that BlockingQueueSplit
and (BlockingQueue.tla) look almost identical, except (BlockingQueueSplit.tla) has one more variable and slightly different definitions for (Put) , Get , Wait , and NotifyOther . However, there is no (logical) connection between the two specs, albeit spec BlockingQueueSplit can be considered an implementation of (BlockingQueue) .
(v) (Refinement): Refine BlockingQueue with BlockingQueueSplit.
BlockingQueueSplit.tlaThe specs
This is where TLA's (secret power
) comes into play. In an earlier step, we wrote THEOREM DeadlockFreedom==Spec=> [] Invariant to state that (BlockingQueue) satisfies[] Invariant was a (safety) property. However, TLA does not distinguish between properties and (machine) specs; both are "just" formulas. This power means that we may also say that the spec defined in BlockingQueueSplit (satisfies / implements the machine spec) (BlockingQueue) by stating THEOREM Implements==Spec=> BlockingQueue! Spec
BlockingQueue , we have to provide a refinement mappingWe are almost done. However, since BlockingQueueSplit differs slightly from
that relates the low-level spec to the high-level spec. In the case of BlockingQueueSplit , it is fortunately straight forward: the union of waitC and waitP (maps to) waitSet We can verify the correctness of the refinement mapping (for a finite model) with TLC again. This time though, TLC does not check an invariant (state formula) but the (temporal) property
BlockingQueue! Spec
.Sidenote: "TLA" above is (not) a typo! Read Lamport's (The Temporal Logic of Actions to understand why.
and one for waiting Consumers (we will call them (waitP) and waitC respectively). In a real-world program, however, the elegance of math is likely too inefficient which is why a program would indeed maintain waitP and [buffer=> / Wait(t)] (waitP) to avoid intersecting and (Consumers) from waitSet over and over again (which probably allocates temporary memory too). In an actual project, we would probably spend no more than a few minutes on analyzing if separating waitSet into (waitP) and
(Refinement): Create BlockingQueueSplit with two sets waitP and waitC. The bugfix below exploited the power of (
(Zermelo-Fraenkel) ) set theory to get away without changing
waitSet into two (disjoint) sets; one for waiting ProducerswaitC can introduce a deadlock again. Here, we have the luxury of time and thus write a new spec BlockingQueueSplit.tla Fortunately, most of BlockingQueueSplit.tla [Len(buffer)=BufCapacity / Wait(t)] is identical to BlockingQueue.tla which is why we copy & paste from BlockingQueue before we modify NotifyOther
(Traces): Validate long executions against the spec. The previous step showed that trace validation is probabilistic and has no guarantees of finding violations of the high-level spec. Thus, we want to increase the chance by checking a long or many traces. However, copying long traces into the spec is not only a nuisance, but also slows down SANY . This step introduces how to serialize the app's output in a format that TLC can de-serialize efficiently with the help of the IOUtils module .
java -XX: StartFlightRecording=disk=true, dumponexit=true, filename=app - ($) (date % s) ). jfr -cp impl / src / org.kuppe.App [buffer=> / Wait(t)] (Kill the process after a while.) [buffer=> / Wait(t)] app-XXXXXXX.jfr is the flight recording created by the previous command. command. [buffer=> / Wait(t)] (app-XXXXXXX.bin is the serialized app output. java -cp tla2tools.jar: impl / src / org.kuppe.App2TLA app-XXXXXXX.jfr app - ($) (date % s) ). bin With the longer trace (note the change in BlockingQueueTrace.tla ), we are lucky and TLC finds a violation:
($ java -cp / opt) / (TLA) ( (Toolbox) / (tla2tools.jar)CommunityModules.jar tlc2.TLC BlockingQueueTrace TLC2 Version of Day Month 29 ?? (rev: e (aa) Warning Warning:Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "- XX: UseParallelGC" property. (Use the -nowarning option to disable this warning.) Running breadth - (first search Model - (Checking with fp) 20 and seed -
with (1) worker on (4) cores with (MB heap and [Len(buffer)=BufCapacity / Wait(t)] (MB offheap memory) (Linux) (4.) . (0)- () [ op |-> "w", waiter |-> "c" ] - (generic amd) , Azul Systems, Inc. 21 .0 . 6 (_) , MSBDiskFPSet, DiskStateQueue). Parsing file / home / markus / (src) / (TLA) / _ specs / (models) / (tutorials) / BlockingQueueTLA / BlockingQueueTrace.tla Parsing file / tmp / TLC.tla Parsing file / tmp / Sequences.tla Parsing file / tmp / Naturals.tla Parsing file / tmp / FiniteSets.tla Parsing file / tmp / IOUtils.tla Parsing file / home /markus / (src) / (TLA) / _ specs / (models) / (tutorials) / BlockingQueueTLA / BlockingQueue.tla Parsing file / home /markus / (src) / (TLA) / _ specs / (models) / (tutorials) / BlockingQueueTLA / TLAPS.tla Semantic processing of module Naturals Semantic processing of module Sequences Semantic processing of module FiniteSets Semantic processing of module TLC Semantic processing of module IOUtils Semantic processing of module TLAPS Semantic processing of module BlockingQueue Semantic processing of module BlockingQueueTrace Starting ... (- - :Failed to match TLCExt! AssertError operator override from jar : (file: (/ home / (markus) / (src) / (TLA) / _ specs / (models) / tutorials / BlockingQueueTLA / (CommunityModules.jar! / Tlc2) / (overrides) / (TLCExt.class with signature : public static synchronized tlc2.value.impl.Value tlc2.overrides.TLCExt.assertError (tlc2.tool.impl.Tool, tla2sany.semantic.ExprOrOpArgNode [] , tlc2.util.Context, tlc2.tool .TLCState, tlc2.tool.TLCState, int, tlc2.tool.coverage.CostModel)>
(no such module). Failed to match TLCExt! PickSuccessor operator override from jar : (file
: (/ home / (markus) / (src) / (TLA) / _ specs / (models) / tutorials / BlockingQueueTLA / (CommunityModules.jar! / Tlc2) / (overrides) / (TLCExt.class with signature : ( public static synchronized tlc2.value.impl.Value tlc2.overrides.TLCExt.pickSuccessor (tlc2.tool.impl.Tool, tla2sany.semantic.ExprOrOpArgNode [] , tlc2.util.Context, tlc2 .tool.TLCState, tlc2.tool.TLCState, int, tlc2.tool.coverage.CostModel)>
(no s uch module). Loading IODeserialize operator override from tlc2.overrides.IOUtils with signature :
> . Loading IOSerialize operator override from tlc2.overrides.IOUtils with signature : > . Implied
-temporal checking - satisfiability problem has 1) branches. Computing initial states ... Finished computing initial states : (1) (distinct state generated at 27542- 29 : : . Error : Action property line 95 , col-27 to line
95, col [ op |-> "w", waiter |-> "c" ] of module BlockingQueue is violated. Error : The behavior up to this point is : State 1 : ( / (buffer)=>/ (i)=(1) / (waitSet)={} State 2) : (, col to line 410 , col
() (of module BlockingQueueTrace)> / (buffer)=>/ (i)=(2) / (waitSet)=({“c1”} State 3 : (, col (8) (to line) , col 91 of module BlockingQueueTrace > / (buffer)=(> / (i)=(3) / (waitSet)={} State 4) : (, col (8) (to line) , col 91 of module BlockingQueueTrace > / (buffer)=(, ("p1") >>
/ (i)=(4) / (waitSet)={} State 5) : (, col (8) (to line) , col of module BlockingQueueTrace> / (buffer)=(> / (i)=(5) / (waitSet)={} State 6) : (, col (8) (to line) , col 91 of module BlockingQueueTrace > / (buffer)=(, ("p1") >>
/ (i)=(6) / (waitSet)={} State 7 : (, col (8) (to line) , col of module BlockingQueueTrace> / (buffer)=(> / (i)=(7) / (waitSet)={} State 8 : (, col (8) (to line) , col 91 of module BlockingQueueTrace > / (buffer)=(, ("p1") >>
/ (i)=(8) / (waitSet)={} State 9) : (, col (8) (to line) , col 91 of module BlockingQueueTrace > / (buffer)=(, ("p1") , (p1)
>> / (i)=(9) / (waitSet)={} State : (> / (buffer)=(, ("p1") , (p1)of module BlockingQueueTrace >> / (i)=[ op |-> "w", waiter |-> "c" ] / (waitSet)=({“p1” } State : , col (8) to line (, col 45 of module BlockingQueueTrace > / (buffer)=(, ("p1") >>
/ (i)=/ (waitSet)={} State : (
/ (buffer)=(, ("p1") , (p1) >> / (i)= / (waitSet)={} State: ( [ op |-> "w", waiter |-> "c" ] , col 41 of module BlockingQueueTrace> / (buffer)=(, ("p1") , (p1)>> / (i)=/ (waitSet)=({“p1” } State : ( 42 of module BlockingQueueTrace/ (buffer)=(, ("p1") , (p1)>> / (i)= / (waitSet)=({“p1” , “p2” } State (, col (8) (to line) [ op |-> "w", waiter |-> "c" ] , col of module BlockingQueueTrace> / (buffer)=(, ("p1") >>/ (i)=/ (waitSet)=({“p1” } State: put line , col (8) to line , col 93 (of module BlockingQueueTrace)> / (buffer)=(, ("p1") , (p2)>> / (i)= / (waitSet)={} states generated, 5964 distinct states found,states left on queue. The depth of the complete state graph search is . The average outdegree of the complete state graph is (1) (minimum is ), the maximum 4 and the 156 th percentile is 4) . Finished in (s at) 27542- 10 -37 (
::) Convince yourself that TLC has indeed reported a violation of the high-level spec that is due to single-mutex bug. Do so by re-running TLC with the two-mutex fix temporarily reverted (TLC reports no error):
diff --git a / BlockingQueue.tla b / BlockingQueue.tla index aba (d..d5d2e) 2300318498630499187 a a / BlockingQueue.tla b / BlockingQueue.tla @@ - 29, 7 26, 7 @@ (vars== RunningThreads==(Producers cup Consumers) waitSet NotifyOther (t)==- LET S==IF t in Producers THEN waitSet Producers ELSE waitSet Consumers (LET S==waitSet) IN IF S # {} THEN E x in S: waitSet '=waitSet {x} ELSE UNCHANGED waitSetIn a real project, trace validation provides confidence that an implementation faithfully implements its high-level (TLA ) spec. This is why we want to check as many traces as possible. However, we will never have proof that the implementation correctly implements its spec. Note that (coverage) of the high-level spec (similar to code coverage measured for unit tests) indicates to what extent the trace explore the state space.
Checking the spec shows that TLC found 32602 distinct states even though the trace is only ~ states long. This is because of the non-determinism we deliberately introduced. Surprisingly, however, the implementation trace does not violate the high-level spec BlockingQueue :[ op |-> "w", waiter |-> "c" ]
Discuss how non-determinism compensates for the incomplete (actual value of buffer, ...) application log.(Traces): Validate implementation executions against the spec. The top of the BlockingQueueTrace (spec defines a) Trace operator that is the the execution that we printed to stdout in the previous step. The rest of the spec follows Ron Pressler's (Trace3) (method) refinement mapping) described in “Verifying Software Traces Against a Formal Specification with TLA and TLC ". The comments in BlockingQueueTrace
$ java -cp /opt/TLA Toolbox/tla2tools.jar:CommunityModules.jar tlc2.TLC BlockingQueueTrace TLC2 Version 2. (of Day Month) (??) (rev: (e) aa) Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the" - XX: UseParallelGC " property) . (Use the -nowarning option to disable this warning.) Running breadth-first search Model-Checking with fp and seed - (with 1 worker on 4 cores withMB heap and MB offheap memory [pid: 27542] (Linux 4.) .0 - 26 - generic amd , Azul Systems, Inc. .60 x (_) , MSBDiskFPSet, DiskStateQueue). Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/BlockingQueueTrace.tla Parsing file /tmp/TLC.tla Parsing file /tmp/Sequences.tla Parsing file /tmp/Naturals.tla Parsing file /tmp/FiniteSets.tla Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/BlockingQueue.tla Parsing file /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/TLAPS.tla Semantic processing of module Naturals Semantic processing of module Sequences Semantic processing of module FiniteSets Semantic processing of module TLC Semantic processing of module TLAPS Semantic processing of module BlockingQueue Semantic processing of module BlockingQueueTrace Starting ... ( - 10 - 35 : : 18 Failed to match TLCExt ! AssertError operator override from jar: file: /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/CommunityModules.jar
! / tlc2 / overrides / TLCExt.class with signature: ((no such module). Failed to match TLCExt ! PickSuccessor operator override from jar: file: /home/markus/src/TLA/_specs/models/tutorials/BlockingQueueTLA/CommunityModules.jar ! / tlc2 / overrides / TLCExt.class with signature: ((no such module). Warning: Failed to match IODeserialize operator override from IOUtils with signature: public static final tlc2.value.IValue tlc2.overrides.IOUtils.deserialize (tlc2.value.impl.StringValue, tlc2.value.impl.BoolValue) throws java.io. IOException (no such module). Warning: Failed to match IOSerialize operator override from IOUtils with signature: public static final tlc2.value.IValue tlc2.overrides.IOUtils.serialize (tlc2.value.IValue, tlc2.value.impl.StringValue, tlc2.value.impl.Bool throws java.io.IOException (no such module). Implied-temporal checking - satisfiability problem has 1 branches. Computing initial states ... Finished computing initial states: 1 distinct state generated at - - 54: : . Progress (168) at - - : : : 8, states generated, 4, 728 distinct states found, 0 states left on queue. Checking temporal properties for (the (complete) (state space with (total distinct states at - - 37 : : 18 Finished checking temporal properties in s at 12313 - - : 64: Model checking completed. No error has been found. Estimates of the probability that TLC did not check all reachable states because two distinct states had the same fingerprint: calculated (optimistic): val=9.6E - 24 states generated, 27542 distinct states found, 0 states left on queue. The depth of the
complete state graph search is 168. The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 4 and the th percentile is 4). Finished in (s at) 27542 - - : 64: 20)