github / scientist, Hacker News

A Ruby library for carefully refactoring critical paths.

How do I science?

Let’s pretend you’re changing the way you handle permissions in a large web app. Tests can help guide your refactoring, but you really want to compare the current and refactored behaviors under load.

require"scientist"classMyWidget  defallows?(user)     experiment=Scientist::Default.new"widget-permissions"    experiment.use {model.check_user? (user) .valid? }#old way    experiment.try {user.can? (: read, model)}#new way    experiment.run   endend

Wrap auseblock around the code’s original behavior, and wraptryaround the new behavior.experiment.runwill always return whatever theuseblock returns, but it does a bunch of stuff behind the scenes:

It decides whether or not to run thetryblock,
Randomizes the order in whichUseandtryblocks are run,
Measures the durations of all behaviors,
Compares the result oftryto the result ofuse,
Swallows (but records) any exceptions raised in thetryblock, and
Publishes all this information.

TheUseblock is called the (control) . Thetryblock is called thecandidate.

Creating an experiment is wordy, but when you include the (Scientist) module, thesciencehelper will instantiate an experiment and callrunfor you:

require"scientist"classMyWidget  includeScientist  defallows?(user)     science"widget-permissions"do|experiment      experiment.use {model.check_user (user) .valid? }#old way      experiment.try {user.can? (: read, model)}#new way    end#returns the control value  endend

If you don’t declare anytryblocks, none of the Scientist machinery is invoked and the control value is always returned.

Making science useful

The examples above will run, but they’re not reallydoinganything. Thetryblocks don’t run yet and none of the results get published. Replace the default experiment implementation to control execution and reporting:

require"scientist / experiment"classMyExperiment  includeScientist::Experiment  attr_accessor: name  definitialize(name:)     @ name=name   end  defenabled?    #see "Ramping up experiments" below    true  end  defpublish(result)     #see "Publishing results" below    presult   endend#replace `Scientist :: Default` as the default implementationmoduleScientist ::Experiment  defself.new(name)     MyExperiment.new(name:name)   endend

Now calls to thesciencehelper will load instances ofMyExperiment.

Controlling comparison

Scientist compares control and candidate values using==. To override this behavior, usecompareto define how to compare observed values instead:

classMyWidget  includeScientist  defusers    science"users"do|e      e.use {User. all}#returns User instances      e.try {UserService. list}#returns UserService :: User instances      e.comparedo|control,candidate        control.map (&: login)==(candidate.map)&: login)       end    end  endend

Adding context

Results aren’t very useful without some way to identify them. Use thecontextmethod to add to or retrieve the context for an experiment:

science"widget-permissions"DO|e|   e.context: user=>user    e.use {model.check_user (user) .valid? }   e.try {user.can? (: read, model)}end

contexttakes a Symbol-keyed Hash of extra data. The data is available inExperiment # publishvia the (contextmethod. If you’re using thesciencehelper a lot in a class, you can provide a default context:

classMyWidget  includeScientist  defallows?(user)     science"widget-permissions"do|e      e.context: user=>user        e.use {model.check_user (user) .valid? }       e.try {user.can? (: read, model)}     end  end  defdestroy    science"widget-destruction"do|e      e.use {old_scary_destroy}       e.try {new_safe_destroy}     end  end  defdefault_scientist_context    {: widget=>self}   endend

Thewidget-permissionsandwidget-destructionexperiments will both have a: Widgetkey in their contexts.

Expensive setup

If an experiment requires expensive setup that should only occur when the experiment is going to be run, define it with thebefore_runmethod:

#Code under test modifies this in-place. We want to copy it for the#candidate code, but only when needed:value_for_original_code=big_object value_for_new_code=nilscience"expensive-but-worthwhile"do|  (e)    e.before_rundo    value_for_new_code=big_object.deep_copy   end  e.use {original_code (value_for_original_code)}   e.try {new_code (value_for_new_code)}end

Keeping it clean

Sometimes you don’t want to store the full value for later analysis. For example, an experiment may returnUserinstances, but when researching a mismatch, all you care about is the logins. You can define how to clean these values in an experiment:

classMyWidget  includeScientist  defusers    science"users"do|e      e.use {User.all}       e.try {UserService. list}        e.cleanDO|value        value.map (&: login)       end    end  endend

And this cleaned value is available in observations in the final published result:

classMyExperiment  includeScientist::Experiment  #...  defpublish(result)     result.control.value#[,,]    result.control.cleaned_value#["alice", "bob", "carol"]  endend

Note that the# cleanmethod will discard the previous cleaner block if you call it again. If for some reason you need to access the currently configured cleaner block,Scientist :: Experiment # cleanerwill return the block without further ado.(This probably won’t come up in normal usage , but comes in handy if you’re writing, say, a custom experiment runner that provides default cleaners.)

Ignoring mismatches

During the early stages of an experiment, it’s possible that some of your code will always generate a mismatch for reasons you know and understand but haven’t yet fixed. Instead of these known cases always showing up as mismatches in your metrics or analysis, you can tell an experiment whether or not to ignore a mismatch using theignoremethod. You may include more than one block if needed:

defadmin?(user)   science"widget-permissions"do|e    e.use {model.check_user (user) .admin? }     e.try {user.can? (: admin, model)}      e.ignore {user.staff? }#user is staff, always an admin in the new system    e.ignoredo|control,candidate      #new system doesn't handle unconfirmed users yet:      control&&!candidate&&!user.confirmed_email?     end  endend

The ignore blocks are only called if thevalues don ‘ t match. If one observation raises an exception and the other doesn’t, it’s always considered a mismatch. If both observations raise different exceptions, that is also considered a mismatch.

Enabling / disabling experiments

Sometimes you don’t want an experiment to run. Say, disabling a new codepath for anyone who isn’t staff. You can disable an experiment by setting arun_ifblock. If this returnsfalse, the experiment will merely return the control value. Otherwise, it defers to the experiment’s configuredenabled?Method.

classDashboardController  includeScientist  defdashboard_items    science"dashboard-items"do|e      #only run this experiment for staff members      e.run_if {current_user.staff? }       #...  endend

Ramping up experiments

As a scientist, you know it’s always important to be able to turn your experiment off, lest it run amok and result in villagers with pitchforks on your doorstep. In order to control whether or not an experiment is enabled, you must include theenabled?method in yourScientist :: Experimentimplementation.

classMyExperiment  includeScientist::Experiment  attr_accessor: name,: percent_enabled  definitialize(name:)     @ name=name     @ percent_enabled=100  end  defenabled?    percent_enabled>0&&rand(100)percent_enabled   end  #... (end)

This code will be invoked for every method with an experiment every time, so be sensitive about its performance. For example, you can store an experiment in the database but wrap it in various levels of caching such as memcache or per-request thread-locals.

Publishing results

What good is science if you can’t publish your results?

You must implement thepublish (result)method, and can publish data however you like. For example, timing data can be sent to graphite, and mismatches can be placed in a capped collection in redis for debugging later.

Thepublishmethod is given aScientist :: Resultinstance with its associatedScientist :: ObservationS:

classMyExperiment  includeScientist::Experiment  #...  defpublish(result)      #Store the timing for the control value,    $ statsd.Timing"science.# {name}.Control", result.control.duration     #for the candidate (only the first, see "Breaking the rules" below,    $ statsd.Timing"science.# {name}. candidate", result.candidates.first.duration      #and counts for match / ignore / mismatch:    ifresult.matched?       $ statsd.Incement"science.# {name}.Matched"    elsifresult.ignored?       $ statsd.Incement"science.# {name}. ignored"    else      $ statsd.Incement"science.# {name}.Mismatched"      #Finally, store mismatches in redis so they can be retrieved and examined      #later on, for debugging and research.      store_mismatch_data (result)     end  end  defstore_mismatch_data(result)     payload={       : name=>name,       : context=>context,       : control=>observation_payload (result.control),       : candidate=>observation_payload (result.candidates.first),       : execution_order=>result.observations.map (&: name)     }      key="science.# {name}.mismatch"    $ redis.lpush key, payload     $ redis.ltrim key,(0) ,1000  end  defobservation_payload(observation)     ifobservation.raised?       {         : exception=>observation.exception.class,         : message=>observation.exception.message,         : Backtrace=>observation.exception.backtrace       }     else      {         #see "Keeping it clean" above        : value=>observation.cleaned_value       }     end  endend

Testing

When running your test suite, it’s helpful to know that the experimental results always match. To help with testing, Scientist defines araise_on_mismatchesclass attribute when you includeScientist :: Experiment. Only do this in your test suite!

To raise on mismatches:

classMyExperiment  includeScientist::Experiment  #... implementationendMyExperiment. raise_on_mismatches=true

Scientist will raise aScientist :: Experiment :: MismatchErrorexception if any observations don’t match.

Custom mismatch errors

To instruct Scientist to raise a custom error instead of the defaultScientist :: Experiment :: MismatchError:

classCustomMismatchError  defto_s    message="There was a mismatch! Here's the diff:"    Diffs=result.candidates.mapdo|candidate      Diff.new(result.control, candidate)     end.Join (" n")      "# {message} n# {diffs}"  endend

science"widget-permissions"DO|e|   e.use {Report. find (id)}   e.try {ReportService.new. fetch (id)}    e.raise_withCustomMismatchErrorend

This allows for pre-processing on mismatch error exception messages.

Handling errors

In candidate code

Scientist rescues and tracksallexceptions raised in atryoruseblock, including some where rescuing may cause unexpected behavior (like (SystemExit) orScriptError). To rescue a more restrictive set of exceptions, modify theRESCUESlist:

#default is [Exception]Scientist::Observation::RESCUES.Replace [StandardError]

In a Scientist callback

If an exception is raised within any of Scientist’s internal helpers, likepublish,compare, orclean, theraisedmethod is called with the symbol name of the internal operation that failed and the exception that was raised. The default behavior ofScientist :: Defaultis to simply re-raise the exception. Since this halts the experiment entirely, it’s often a better idea to handle this error and continue so the experiment as a whole isn’t canceled entirely:

classMyExperiment  includeScientist::Experiment  #...  defraised(operation,error)     InternalErrorTracker.Track!"science failure in# {name}:# {operation}", error   endend

The operations that may be handled here are:

: clean– an exception is raised in a (clean) ***************************** (block)
: compare– an exception is raised in acompareblock
: enabled– an exception is raised in theenabled?method
: ignore– an exception is raised in anignoreblock
: publish– an exception is raised in thepublishmethod
: run_if– an exception is raised in arun_ifblock

Designing an experiment

Becauseenabled?andrun_ifdetermine when a candidate runs, it’s impossible to guarantee that it will run every time. For this reason, Scientist is only safe for wrapping methods that aren’t changing data.

When using Scientist, we’ve found it most useful to modify both the existing and new systems simultaneously anywhere writes happen, and verify the results at read time withscience.raise_on_mismatcheshas also been useful to ensure that the correct data was written during tests, and reviewing published mismatches has helped us find any situations we overlooked with our production data at runtime. When writing to and reading from two systems, it’s also useful to write some data reconciliation scripts to verify and clean up production data alongside any running experiments.

Noise and error rates

Keep in mind that Scientist’stryanduseblocks run sequentially in random order. As such, any data upon which your code depends may change before the second block is invoked, potentially yielding a mismatch between the candidate and control return values. To calibrate your expectations with respect tofalse negativesarising from systemic conditions external to your proposed changes, consider starting with an experiment in which both thetryand (use) blocks invoke the control method. Then proceed with introducing a candidate.

Finishing an experiment

As your candidate behavior converges on the controls, you’ll start thinking about removing an experiment and using the new behavior.

If there are any ignore blocks, the candidate behavior is (guaranteed) to be different. If this is unacceptable, you’ll need to remove the ignore blocks and resolve any ongoing mismatches in behavior until the observations match perfectly every time.
When removing a read-behavior experiment, it’s a good idea to keep any write- side duplication between an old and new system in place until well after the new behavior has been in production, in case you need to roll back.

Breaking the rules

Sometimes scientists just gotta do weird stuff. We understand.

Ignoring results entirely

Science is useful even when all you care about is the timing data or even whether or not a new code path blew up. If you have the ability to incrementally control how often an experiment runs via yourenabled?method, you can use it to silently and carefully test new code paths and ignore the results altogether. You can do this by settingignore {true}, or for greater efficiency,compare {true}.

This will still log mismatches if any exceptions are raised, but will disregard the values entirely.

Trying more than one thing

It’s not usually a good idea to try more than one alternative simultaneously. Behavior isn’t guaranteed to be isolated and reporting visualization get quite a bit harder. Still, it’s sometimes useful.

To try more than one alternative at once, add names to sometryblocks:

require"scientist"classMyWidget  includeScientist  defallows?(user)     science"widget-permissions"do|e      e.use {model.check_user (user) .valid? }#old way      e.try ("API") {user.can? (: read, model)}#new service API      e.try ("raw-sql") {user.can_sql? (: read, model)}#raw query    end  endend

When the experiment runs, all candidate behaviors are tested and each candidate observation is compared with the control in turn.

No control, just candidates

Define the candidates with namedtryblocks, omit ause, and pass a candidate name torun:

experiment=MyExperiment.  (new) ************************* ("Various-ways")do|e  e.try ("first-way") {...}   e.try ("second-way") {...}endexperiment.run"second-way")

Thesciencehelper also knows this trick:

science"Various-ways",run:"first-way"do|e  e.try ("first-way") {...}   e.try ("second-way") {...}  (end)

Providing fake timing data

If you’re writing tests that depend on specific timing values, you can provide canned durations using thefabricate_durations_for_testing_purposesmethod, and Scientist will report these inScientist :: Observation # durationinstead of the actual execution times.

science"absolutely-nothing-suspicious-happening-here"do|e  e.use {...}#"control"  e.try {...}#"candidate"  e.fabricate_durations_for_testing_purposes"control"=>1.0,"candidate""=>0.5)end

fabricate_durations_for_testing_purposestakes a Hash of duration values, keyed by behavior names. (By default, Scientist uses"control"and"candidate", but if you override these as shown inTrying more than one thingorNo control, just candidates, use matching names here.) If a name is not provided, the actual execution time will be reported instead.

LikeScientist :: Experiment # cleaner, this probably won’t come up in normal usage. It’s here to make it easier to test code that extends Scientist.

Without including Scientist

If you need to use Scientist in a place where you aren’t able to include the Scientist module, you can callScientist.run:

Scientist.Run" (widget-permissions""do|e  e.use {model.check_user (user) .valid? }   e.try {user.can? (: read, model)}end

Hacking

Be on a Unixy box. Make sure a modern Bundler is available.script / testruns the unit tests. All development dependencies are installed automatically. Scientist requires Ruby 2.3 or newer.