in

JSON Parser with JavaScript, Hacker News


The interview question of the week for this week on Cassidoo’s weekly newsletter is,

Write a function that takes in a string of valid JSON and converts it to an object (or whatever your chosen language uses, dicts, maps, etc). Example input:

************ fakeParseJSON ('{” data ": {" fish ":" cake "," array ": [1,2,3]," children ": [ { "something": "else" }, { "candy": "cane" }, { "sponge": "bob" } ]}} ')

At one point, I was tempted to just to write:

)
const
fakeParseJSON
=
******************* (JSON) .
parse
;

But, I thought, I’ve written quite a few articles about AST:

Creating custom JavaScript syntax with Babel
  • Step-by-step guide for writing a custom babel transformation
  • Manipulating AST with JavaScript
  • which covers the overview of the compiler pipeline, as well as how to manipulate AST, but I haven't covered much on how to implement a parser.

    That’s because, implementing a JavaScript compiler in an article is a task too daunting for me.

    Well, fret not. JSON is also a language. It has its own grammar, which you can refer from the specifications (***********************. The knowledge and technique you need to write a JSON parser is transferrable to writing a JS parser.

    So, let’s start writing a JSON parser!

    ************************** (

  • Both diagrams are equivalent.

    One is visual and one is text based. The text based grammar syntax, Backus-Naur Form, is usually fed to another parser that parse this grammar and generate a parser for it. Speaking of parser-ception!

    🤯

    In this article, we will focus on the railroad diagram, because it is visual and seemed to be more friendly to me.

    Lets ’look at the first railroad diagram:

    ******** (**************************************** (Image source:**************************** (https://www.json.org/img/object.png) ***********************

    So this is the grammar for (“object”) in JSON. ************

    We start from the left, following the arrow, and then we end at the right.

    The circles, eg

    {
    , (**************, (**************, ************ : (****************,
    }
    , are the characters, and the boxes eg:whitespace
    ,  string, andvalueis a placeholder for another grammar. So to parse the “whitespace”, we will need to look at the grammar for  “whitepsace” (**********************************************. **********

    So, starting from the left, for an object, the first character has to be an open curly bracket,{(****************. and then we have 2 options from here:

    (whitespace) ************** →
    } → end, or
  • whitespacestringwhitespace
    : (→value
    }→ end

    Of course, when you reach “value”, you can choose to go to:

    → (*************}→ end, or
  • ,whitespace→ ... → value

    and you can keep looping, until you decide to go to:

    → (***************}→ end.

    So, I guess we are now acquainted with the railroad diagram, let's carry on to the next section.

    () ********************************

    Let’s start with the following structure:

    )
    function
    fakeParseJSON (
    str
    ) (**************** {********************)   
    let
    i=
    0
    ;
      
    ********************

    We initialiseias the index for the current character, we will end as soon asireaches the end of the

    str
    .

    Let's implement the grammar for the (“object”:

    **************

    )

    function

    fakeParseJSON (

    str

    ) (**************** {********************)   

    let

    i=

    0

    ;

      

    function

    (parseObject (

    )

    {     

    if

    ( str) [i]==='{'

    )

    {

          i

    ;

          

    skipWhitespace

    ************** ()

    ;

                        

    while

    ************** ( str) [i]!=='}'

    )

    {

            

    const

    key=

    parseString (

    )         

    skipWhitespace

    ************** ()

    ;

            

    eatColon

    ()

    ;

            

    const

    value=

    parseValue (

    )       

        

      

    ********************

    In theparseObject, we will call parse of other grammars, like “string” and “whitespace”, when we implement them, everything will work

    🤞 (*********************.

    One thing that I forgot to add is the comma,,. The,only appears before we start the second loop ofwhitespace

    string (→whitespace→ (**************: → ... **********

    Based on that, we add the following lines:

    )
    function
    fakeParseJSON (
    str
    ) (**************** {********************)   
    let
    i=
    0
    ;
      
    function
    (parseObject (
    )
    {     
    if
    ( str) [i]==='{'
    )
    {
          i
    ;
          
    skipWhitespace
    ************** ()
    ;
    let
    let (initial) ******************=
      true
      (
                   
      while
      ************** ( str) [i]!=='}'
      )
      {
      if (
      )
      initial
      ({******************) **********************
      eatComma
      (
      ********************skipWhitespace
      ()
      )
      ;
      const
      key
      =
      parseString ()
              
      skipWhitespace
      ************** ()
      ;
              
      eatColon
      ()
      ;
              
      const
      value=
      parseValue (
      )
      initial
      =false
                   i
      ;
          
        
      ********************

      Some naming convention:

    • We call (parseSomething, when we parse the code based on grammar and use the return value
    • We calleatSomething, when we expect the character (s) to be there, but we are not using the character (s)
    • We callskipSomething, when we are okay if the character (s) is not there.
    • Let's implement the [ (trimmed?'...':'')str.slice(from,i1), ' '.repeat(padding)'^', ' '.repeat(padding)message, ] eatCommaandeatColon (****************:
      )
      function
      fakeParseJSON (
      str
      ) (**************** {********************)      
      function
      (eatComma) **************
      (
      )
      {     
      if
      ( str) [i]!==','
      )
      {
            
      throw
      (new) ************
      Error
      (“Expected” , ". '
          
          i
      ;
        
        
      function
      (eatColon
      (
      )
      {     
      if
      ( str) [i]!==':'
      )
      {
            
      throw
      (new) ************
      Error
      (“Expected” : '.'
          
          i
      ;
        
      ********************

      So we have finished implemented theparseObject