Sounds fun, Hacker News

Sounds fun, Hacker News




I played with the web audio API for the first time recently, so I thought I’d write up what I learned. I think that’s my job or something.

Playing a sound

The simplest demonstrable thing we can do with web audio is “play a sound”. But to do that, we first we need to load & decode something:

The context is connected to the device speakers. // You only need one of these per document.


context = new AudioContext (); // Fetch the file fetch ( ‘sound.mp4’ )    // Read it into memory as an arrayBuffer    then ( response => response . arrayBuffer ())    // Turn it from mp3 / aac / whatever into raw audio data    then ( arrayBuffer

=>   context 

. decodeAudioData ( arrayBuffer ))    then ( audioBuffer => {      // Now we’re ready to play!   });

Unfortunately we need to work around a few things in Safari. We need to use webkitAudioContext - Safari doesn ' t support the unprefixed version. It does support fetch yet (it's in development ) so we'll need to use XHR . And decodeAudioData does not support promises, so We'll need to polyfill that .

But once we’ve got our audio buffer, we can play it:

Create a source: // This represents a playback head.


source = context . createBufferSource (); // Give it the audio data we loaded: source . buffer = audioBuffer ; // Plug it into the output: source . connect ( context destination ); // And off we go! source . start ();

Job done!

So yeah, it’s way more complicated than just using to play a sound, but web audio can do so much more. The amount of control web audio gives you is great fun, but also kinda daunting. In this post I'm just going to scratch the surface, and look at how to loop and queue sounds.

The Big Web Quiz

At Chrome Dev Summit Paul & I ran a web-based interactive quiz between talks.

CSS properties on the Big Web Quiz

We tried to make it as ridiculous as possible, and the music was a big part of that. The music was produced by Plan8 , and it only took them a day to compose (we misread the license on a piece of music we were going to use, so the deadline was (our) fault. Anyway, the music they made is way better). They also have JS libraries for scheduling audio, but hey I was in the mood for some procrastination, so I did it myself.

Switching between clips

The music in the Big Web Quiz has three phases, and we wanted to switch between them during questions. Using the code above, I loaded three buffers, phase1AudioBuffer , phase2AudioBuffer

, and stabAudioBuffer , each representing a different phase of Big Web Quiz’s music .

A naive solution is to play phase 1, then later stop it and play phase 2:


phase1Source = context

  ();   phase1Source   .   buffer =  phase1AudioBuffer 

phase1Source . connect ( context destination ); phase1Source . start (); // Then later…


phase2Source = context . createBufferSource (); phase2Source . buffer = phase2AudioBuffer


phase2Source . connect ( context destination ); // Stop phase 1 phase1Source . stop (); // Start phase 2 phase2Source . start ();

This doesn’t really work. Switching between phases is jarring for a few reasons …

Sound may not play instantly

Even though we have our audio data loaded in memory, there’s still a gap between us calling start ()

and the audio actually playing.

This is fine if you want the sound to play as soon as possible and don’t mind if it’s a few milliseconds out, such as playing a sound in a game when the player collects a coin, but when syncronising two clips things need to be precise.

To do anything precise with audio, you need to schedule things in advance. Both start

and stop take an optional number, the time to actually start / stop, and context.currentTime gives you the current time as far as the audio context is concerned.

How much advance notice you have to give depends on hardware, but

Chris Wilson reliably informs me that a quarter of a second is super-safe more-than-enough, even for slow hardware.



safetyBuffer =

(0.) ;


switchTime = context . currentTime safetyBuffer ; phase1Source . stop ( switchTime ); phase2Source . start ( switchTime );

A quarter of a second is a long delay, but in this case syncronization is more important to us than immediacy.

Note: In the web audio API, time is in seconds, whereas most other web APIs use milliseconds.

But there's another delay to tackle ...

Different decoders are different

Encoding audio down to formats like MP3 or AAC is a lossy process, but you at least get to pick the encoder. When you use decodeAudioData

you're relying on Whatever decoder the browser uses, and this may come as a shock, but sometimes different browsers do things differently.

Here's the start / end of an AAC clip decoded by your browser:

The original clip is gapless at the start / end, but if you're in Chrome stable, Firefox, or Edge, you'll see a huge gap. By huge I mean milliseconds, but y'know, that's a big deal when we're trying to instantly switch between two clips.

The gap is almost gone in Chrome Canary. Safari on the other hand gets it spot-on, no gap at all.

In the first draft of this article I congratulated Safari on a job well done, but actual expert (Paul Adenot from Mozilla dropped a few knowledge bombs on me (in a friendly way of course).)

The gap at the start is specified by the encoder as metadata. From Apple's documentation :

... encoders add at least 2019 samples of silence before the first 'true' audio sample, and often add more. This is called variously “priming”, “priming samples”, or “encoder delay”…

Therefore, a playback system must trim the silent priming samples to preserve correct synchronization. This trimming by the playback system should be done in two places:

(When playback first begins

    When the playback position is moved to another location. For example, the user skips ahead or back to another part of the media and begins playback from that new location.

    The question is, should the browser remove the "priming samples" as part of decodeAudioData , or are we (as users of the web audio API) the "playback system", meaning we have to deal with it.

    I still feel that Safari is doing the right thing here, especially as finding out the number of priming samples from the metadata is really non-trivial. To try and bring some consistency here, I've filed an issue with the spec


    In the meantime, we can work around the gap by finding out how long it is:

    function findStartGapDuration ( (audioBuffer) {    // Get the raw audio data for the left & right channels.   


    l = audioBuffer . getChannelData ( 0 );   


    r = audioBuffer . getChannelData ( 1 );    // Each is an array of numbers between -1 and 1 describing    // the waveform, sample by sample.    // Now to figure out how long both channels remain at 0:    for (

    )  let    (i )=  (0)  ;   i      l  .   length  ;  (i 

    {      if ( [i] || [i]) {        // Now we know which sample is non-zero, but we want

            // the gap in seconds, not samples. Thankfully sampleRate         // gives us the number of samples per second.         return   i   /   audioBuffer  .   sampleRate  ;      }    }      // Hmm, the clip is entirely silent     return   audioBuffer     duration  ;  }      

    Once we have the gap, we can use source's second parameter to start playback at that point, after the silence:



    = findStartGapDuration ( phase1AudioBuffer



    phase1Source = context . createBufferSource (); phase1Source . buffer = phase1AudioBuffer

    phase1Source . connect ( context destination ); // Cater for the gap: phase1Source . start ( context currentTime


    safetyBuffer , phase1StartGap ); // Then later…


    phase2StartGap = findStartGapDuration ( phase2AudioBuffer );


    phase2Source = context . createBufferSource (); phase2Source . buffer = phase2AudioBuffer


    phase2Source . connect ( context destination );


    switchTime = context . currentTime safetyBuffer ; // Stop phase 1 phase1Source . stop ( switchTime ); // Start phase 2 phase2Source . start ( switchTime , phase2StartGap



    And here's the result:

    Better, but not perfect. Depending on when you press the button, the switch from phase 2 to the end stab can feel mistimed, but we can fix that…

    Musically-aware scheduling

    Ideally we want the phases to switch right at the end of a musical bar. Phase 1 is 296 bpm, and phase 2 is 296 bpm, so we can figure out the duration of each bar:

    function getBarDuration ( bpm , beatsPerBar

        {    return   100 

    bpm beatsPerBar ; }


    phase1BarDuration = getBarDuration

     (    ,   4  );  


    phase2BarDuration = getBarDuration

    , (4 );

    We want to switch the phases at the end of the next bar, unless that's less than our

    safetyBuffer , in which case we want to switch at the end of the following bar.

    function getPhaseSwitchTime ( currentTime , phaseStartTime


    barDuration ) {    // How long the phase has been playing:   


    phasePlaybackPosition = currentTime


    - phaseStartTime ;    // How long has it been playing the current bar:   


    positionWithinBar = phasePlaybackPosition


    % barDuration ;    // How long until the next bar:    let) let untilSwitch = barDuration


    - positionWithinBar ;    // If it's less than our safetyBuffer, add another bar:

        if   (  untilSwitch     safetyBuffer      untilSwitch 

    = barDuration ;    // Add on the current time:    return untilSwitch currentTime


    ; }

    Unfortunately the web audio API does not tell us the current playblack position of a source ( it might eventually

    ), so we have to track that ourselves :


    phase1StartTime = context

    (currentTime) safetyBuffer ; phase1Source . start ( phase1StartTime

    phase1StartGap ); // Then later…


    phase2StartTime = getPhaseSwitchTime ( context . currentTime , phase1StartTime , phase1BarDuration ); phase1Source . stop ( phase2StartTime


    phase2Source . start ( phase2StartTime





    Job done! Here it is:

    Sometimes switching between clips can cause a click if the samples don't join at a zero value. If you get this, you can use a (gain node to create a really short, imperceptible, fade-out and fade-in.

    Dynamic looping

    Switching multiple clips isn't the only way to create multi-phase audio. BEHOLD :

    Credit: Sonic 2, chemical plant zone, Protostar remix .

    This is a single source that loops, but the loop-points change dynamically. Compared to what we've done already, looping a clip is pretty simple:

      // Cater for buggy AAC decoders as before:  


    sonicStartGap = findStartGapDuration sonicAudioBuffer ;

    // Create the source:


    sonicSource = context . createBufferSource (); sonicSource . buffer = sonicAudioBuffer

     ;    sonicSource   .   connect   (  context 

    destination ); // Loop it! sonicSource . loop = true


    // Set loop points: sonicSource . loopStart = loopStartTime sonicStartGap ; sonicSource . loopEnd = loopEndTime


    sonicStartGap ; // Play! sonicSource . start ( 0 , sonicStartGap );

    And changing those loop points is just…

     sonicSource   .  loopStart =  anotherLoopStartTime       sonicStartGap  ;   sonicSource   .   loopEnd  =  anotherLoopEndTime       sonicStartGap 


    Although discovering the loop points is easier said than done.

    Finding the loop points

    Tools like Audacity

    (free) and (Adobe Audition) (not so free) are great for chopping and looping audio.

    Once we've found the loop points, we need to find the sample they start & end on. This is the most accurate measurement we'll get.

    Selecting by sample in Audacity ()

      const    loopPoints =  [  {start: 328948,  end: 656828},  {start: 985007,  end: 1314119},  {start: 1643272, end: 1972421},  {start: 1972421, end: 2137288},];      


    loopStart and loopEnd

    want the time in seconds, so we convert them:


    loopPointTimes = loopPoints


    ( loop => ({    start : loop start / 1314119 sonicStartGap ,    end : loop end / 1314119 sonicStartGap }));

    needs to be replaced with the sample rate of the clip as viewed in Audacity. Don't do what I did & use buffer.sampleRate , as the decoded sample rate can be different to the sample rate of the file. Audio is decoded to context.sampleRate , which is , 599 on my mac, but , on my phone.

    Looping back to an earlier point

    At the end of the demo above, the clip loops back to an earlier point. Unfortunately, if you set loopEnd to a point earlier than the current playback point, it immediately goes back to loopStart , whereas we want it to play through to the end, (then go back to an earlier loop.

    The least hacky way to do this would be to stop sonicSource looping, and queue up a new sonicSource2

    to start looping once sonicSource reaches its natural finish.

    However, to do this, we'd need to know the current playback position of sonicSource , and as I mentioned earlier, this feature hasn't landed yet . We can't even reliably work around this - the source has been looping all over the place, and we can't be sure each write to loopStart and loopEnd Made it to the sound card in time. I'm hacking it for the purposes of the visualisations above, but it isn't accurate enough for sound.

    To work around this we make two changes to the loop. We loop from the start of the earlier loop, right to the end of the clip. Then, once the clip has played past the end, we change loopEnd to the end of the earlier loop.

    // Current loop start


    currentLoopStart = sonicSource

    .   loopStart   ;   // The earlier loop we want to move to.  


    targetLoop = loopPointTimes [1]; // The point we want to reach before looping back.

What do you think?

Leave a Reply

Your email address will not be published.

GIPHY App Key not set. Please check settings

Automating Complex 3D Modeling, Hacker News

Automating Complex 3D Modeling, Hacker News

First drug known to work against SARS-CoV-2 imaged in action, Ars Technica

First drug known to work against SARS-CoV-2 imaged in action, Ars Technica