| Parsing |
| ======= |
| |
| Parsing in Sky is a strict pipeline consisting of four stages: |
| |
| - decoding, which converts incoming bytes into Unicode characters |
| using UTF-8 |
| |
| - normalising, which converts certain sequences of characters |
| |
| - tokenising, which converts these characters into tokens |
| |
| - tree construction, which converts these tokens into a tree of nodes |
| |
| Later stages cannot affect earlier stages. |
| |
| When a sequence of bytes is to be parsed, there is always a defined |
| _parsing context_, which is either "application" or "module". |
| |
| |
| Decoding stage |
| -------------- |
| |
| To decode a sequence of bytes _bytes_ for parsing, the [UTF-8 |
| decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used |
| to transform _bytes_ into a sequence of characters _characters_. |
| |
| This sequence must then be passed to the normalisation stage. |
| |
| |
| Normalisation stage |
| ------------------- |
| |
| To normalise a sequence of characters, apply the following rules: |
| |
| * Any U+000D character followed by a U+000A character must be removed. |
| |
| * Any U+000D character not followed by a U+000A character must be |
| converted to a U+000A character. |
| |
| * Any U+0000 character must be converted to a U+FFFD character. |
| |
| The converted sequence of characters must then be passed to the |
| tokenisation stage. |
| |
| |
| Tokenisation stage |
| ------------------ |
| |
| To tokenise a sequence of characters, a state machine is used. |
| |
| Initially, the state machine must begin in the **signature** state. |
| |
| Each character in turn must be processed according to the rules of the |
| state at the time the character is processed. A character is processed |
| once it has been _consumed_. This produces a stream of tokens; the |
| tokens must be passed to the tree construction stage. |
| |
| When the last character is consumed, the tokeniser ends. |
| |
| |
| ### Expecting a string ### |
| |
| When the user agent is to _expect a string_, it must run these steps: |
| |
| 1. Let _expectation_ be the string to expect. When this string is |
| indexed, the first character has index 0. |
| |
| 2. Assertion: The first character in _expectation_ is the current |
| character, and _expectation_ has more than one character. |
| |
| 3. Consume the current character. |
| |
| 4. Let _index_ be 1. |
| |
| 5. Let _success_ and _failure_ be the states specified for success and |
| failure respectively. |
| |
| 6. Switch to the **expect a string** state. |
| |
| |
| ### Tokeniser states ### |
| |
| #### **Signature** state #### |
| |
| If the current character is... |
| |
| * '```#```': If the _parsing context_ is not "application", switch to |
| the _failed signature_ state. Otherwise, expect the string |
| "```#!mojo mojo:sky```", with _after signature_ as the _success_ |
| state and _failed signature_ as the _failure_ state. |
| |
| * '```S```': If the _parsing context_ is not "module", switch to the |
| _failed signature_ state. Otherwise, expect the string |
| "```SKY MODULE```", with _after signature_ as the _success_ state, |
| and _failed signature_ as the _failure_ state. |
| |
| * Anything else: Jump to the **failed signature** state. |
| |
| |
| #### **Expect a string** state #### |
| |
| If the current character is not the same as the <i>index</i>th character in |
| _expectation_, then switch to the _failure_ state. |
| |
| Otherwise, consume the character, and increase _index_. If _index_ is |
| now equal to the length of _expectation_, then switch to the _success_ |
| state. |
| |
| |
| #### **After signature** state #### |
| |
| If the current character is... |
| |
| * U+000A: Consume the character and switch to the **data** state. |
| * U+0020: Consume the character and switch to the **consume rest of |
| line** state. |
| * Anything else: Switch to the **failed signature** state. |
| |
| |
| #### **Failed signature** state #### |
| |
| Stop parsing. No tokens are emitted. The file is not a sky file. |
| |
| |
| #### **Consume rest of line** state #### |
| |
| If the current character is... |
| |
| * U+000A: Consume the character and switch to the **data** state. |
| * Anything else: Consume the character and stay in this state. |
| |
| |
| ### **Data** state ### |
| |
| If the current character is... |
| |
| * '```<```': Consume the character and switch to the **tag open** state. |
| |
| * '```&```': Consume the character and switch to the **character |
| reference** state, with the _return state_ set to the **data** |
| state, the _extra terminating character_ unset (or set to U+0000, |
| which has the same effect), and the _emitting operation_ being to |
| emit a character token for the given character. |
| |
| * Anything else: Emit the current input character as a character |
| token. Consume the character. Stay in this state. |
| |
| |
| ### **Script raw data** state ### |
| |
| If the current character is... |
| |
| * '```<```': Consume the character and switch to the **script raw |
| data: close 1** state. |
| |
| * Anything else: Emit the current input character as a character |
| token. Consume the character. Stay in this state. |
| |
| |
| ### **Script raw data: close 1** state ### |
| |
| If the current character is... |
| |
| * '```/```': Consume the character and switch to the **script raw |
| data: close 2** state. |
| |
| * Anything else: Emit '```<```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 2** state ### |
| |
| If the current character is... |
| |
| * '```s```': Consume the character and switch to the **script raw |
| data: close 3** state. |
| |
| * Anything else: Emit '```</```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 3** state ### |
| |
| If the current character is... |
| |
| * '```c```': Consume the character and switch to the **script raw |
| data: close 4** state. |
| |
| * Anything else: Emit '```</s```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 4** state ### |
| |
| If the current character is... |
| |
| * '```r```': Consume the character and switch to the **script raw |
| data: close 5** state. |
| |
| * Anything else: Emit '```</sc```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 5** state ### |
| |
| If the current character is... |
| |
| * '```i```': Consume the character and switch to the **script raw |
| data: close 6** state. |
| |
| * Anything else: Emit '```</scr```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 6** state ### |
| |
| If the current character is... |
| |
| * '```p```': Consume the character and switch to the **script raw |
| data: close 7** state. |
| |
| * Anything else: Emit '```</scri```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 7** state ### |
| |
| If the current character is... |
| |
| * '```t```': Consume the character and switch to the **script raw |
| data: close 8** state. |
| |
| * Anything else: Emit '```</scrip```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Script raw data: close 8** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A, '```/```', '```>```': Create an end tag token, and |
| let its tag name be the string '```script```'. Switch to the |
| **before attribute name** state without consuming the character. |
| |
| * Anything else: Emit '```</script```' character tokens. Consume the |
| character. Switch to the **script raw data** state. |
| |
| |
| ### **Style raw data** state ### |
| |
| If the current character is... |
| |
| * '```<```': Consume the character and switch to the **style raw |
| data: close 1** state. |
| |
| * Anything else: Emit the current input character as a character |
| token. Consume the character. Stay in this state. |
| |
| |
| ### **Style raw data: close 1** state ### |
| |
| If the current character is... |
| |
| * '```/```': Consume the character and switch to the **style raw |
| data: close 2** state. |
| |
| * Anything else: Emit '```<```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 2** state ### |
| |
| If the current character is... |
| |
| * '```s```': Consume the character and switch to the **style raw |
| data: close 3** state. |
| |
| * Anything else: Emit '```</```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 3** state ### |
| |
| If the current character is... |
| |
| * '```t```': Consume the character and switch to the **style raw |
| data: close 4** state. |
| |
| * Anything else: Emit '```</s```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 4** state ### |
| |
| If the current character is... |
| |
| * '```y```': Consume the character and switch to the **style raw |
| data: close 5** state. |
| |
| * Anything else: Emit '```</st```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 5** state ### |
| |
| If the current character is... |
| |
| * '```l```': Consume the character and switch to the **style raw |
| data: close 6** state. |
| |
| * Anything else: Emit '```</sty```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 6** state ### |
| |
| If the current character is... |
| |
| * '```e```': Consume the character and switch to the **style raw |
| data: close 7** state. |
| |
| * Anything else: Emit '```</styl```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Style raw data: close 7** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A, '```/```', '```>```': Create an end tag token, and |
| let its tag name be the string '```style```'. Switch to the |
| **before attribute name** state without consuming the character. |
| |
| * Anything else: Emit '```</style```' character tokens. Consume the |
| character. Switch to the **style raw data** state. |
| |
| |
| ### **Tag open** state ### |
| |
| If the current character is... |
| |
| * '```!```': Consume the character and switch to the **comment start |
| 1** state. |
| |
| * '```/```': Consume the character and switch to the **close tag |
| state** state. |
| |
| * '```>```': Emit character tokens for '```<>```'. Consume the current |
| character. Switch to the **data** state. |
| |
| * '```0```'..'```9```', '```a```'..'```z```', '```A```'..'```Z```', |
| '```-```', '```_```', '```.```': Create a start tag token, let its |
| tag name be the current character, consume the current character and |
| switch to the **tag name** state. |
| |
| * Anything else: Emit the character token for '```<```'. Switch to the |
| **data** state without consuming the current character. |
| |
| |
| ### **Close tag** state ### |
| |
| If the current character is... |
| |
| * '```>```': Emit character tokens for '```</>```'. Consume the current |
| character. Switch to the **data** state. |
| |
| * '```0```'..'```9```', '```a```'..'```z```', '```A```'..'```Z```', |
| '```-```', '```_```', '```.```': Create an end tag token, let its |
| tag name be the current character, consume the current character and |
| switch to the **tag name** state. |
| |
| * Anything else: Emit the character tokens for '```</```'. Switch to |
| the **data** state without consuming the current character. |
| |
| |
| ### **Tag name** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Switch to the |
| **before attribute name** state. |
| |
| * '```/```': Consume the current character. Switch to the **void tag** |
| state. |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * Anything else: Append the current character to the tag name, and |
| consume the current character. Stay in this state. |
| |
| |
| ### **Void tag** state ### |
| |
| If the current character is... |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * Anything else: Switch to the **before attribute name** state without |
| consuming the current character. |
| |
| |
| ### **Before attribute name** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Stay in this state. |
| |
| * '```/```': Consume the current character. Switch to the **void tag** |
| state. |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * Anything else: Create a new attribute in the tag token, and set its |
| name to the current character. Consume the current character. Switch |
| to the **attribute name** state. |
| |
| |
| ### **Attribute name** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Switch to the **after |
| attribute name** state. |
| |
| * '```/```': Consume the current character. Switch to the **void tag** |
| state. |
| |
| * '```=```': Consume the current character. Switch to the **before |
| attribute value** state. |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * Anything else: Append the current character to the most recently |
| added attribute's name, and consume the current character. Stay in |
| this state. |
| |
| |
| ### **After attribute name** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Stay in this state. |
| |
| * '```/```': Consume the current character. Switch to the **void tag** |
| state. |
| |
| * '```=```': Consume the current character. Switch to the **before |
| attribute value** state. |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * Anything else: Create a new attribute in the tag token, and set its |
| name to the current character. Consume the current character. Switch |
| to the **attribute name** state. |
| |
| |
| ### **Before attribute value** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Stay in this state. |
| |
| * '```>```': Consume the current character. Switch to the **after |
| tag** state. |
| |
| * '```'```': Consume the current character. Switch to the |
| **single-quoted attribute value** state. |
| |
| * '```"```': Consume the current character. Switch to the |
| **double-quoted attribute value** state. |
| |
| * Anything else: Set the value of the most recently added attribute to |
| the current character. Consume the current character. Switch to the |
| **unquoted attribute value** state. |
| |
| |
| ### **Single-quoted attribute value** state ### |
| |
| If the current character is... |
| |
| * '```'```': Consume the current character. Switch to the |
| **before attribute name** state. |
| |
| * '```&```': Consume the character and switch to the **character |
| reference** state, with the _return state_ set to the |
| **single-quoted attribute value** state, the _extra terminating |
| character_ set to '```'```', and the _emitting operation_ being to |
| append the given character to the value of the most recently added |
| attribute. |
| |
| * Anything else: Append the current character to the value of the most |
| recently added attribute. Consume the current character. Stay in |
| this state. |
| |
| |
| ### **Double-quoted attribute value** state ### |
| |
| If the current character is... |
| |
| * '```"```': Consume the current character. Switch to the |
| **before attribute name** state. |
| |
| * '```&```': Consume the character and switch to the **character |
| reference** state, with the _return state_ set to the |
| **double-quoted attribute value** state, the _extra terminating |
| character_ set to '```"```', and the _emitting operation_ being to |
| append the given character to the value of the most recently added |
| attribute. |
| |
| * Anything else: Append the current character to the value of the most |
| recently added attribute. Consume the current character. Stay in |
| this state. |
| |
| |
| ### **Unquoted attribute value** state ### |
| |
| If the current character is... |
| |
| * U+0020, U+000A: Consume the current character. Switch to the |
| **before attribute name** state. |
| |
| * '```>```': Consume the current character. Switch to the **data** |
| state. Switch to the **after tag** state. |
| |
| * '```&```': Consume the character and switch to the **character |
| reference** state, with the _return state_ set to the **unquoted |
| attribute value** state, the _extra terminating character_ unset (or |
| set to U+0000, which has the same effect), and the _emitting |
| operation_ being to append the given character to the value of the |
| most recently added attribute. |
| |
| * Anything else: Append the current character to the value of the most |
| recently added attribute. Consume the current character. Stay in |
| this state. |
| |
| |
| ### **After tag** state ### |
| |
| Emit the tag token. |
| |
| If the tag token was a start tag token and the tag name was |
| '```script```', then and switch to the **script raw data** state. |
| |
| If the tag token was a start tag token and the tag name was |
| '```style```', then and switch to the **style raw data** state. |
| |
| Otherwise, switch to the **data** state. |
| |
| |
| ### **Comment start 1** state ### |
| |
| If the current character is... |
| |
| * '```-```': Consume the character and switch to the **comment start |
| 2** state. |
| |
| * '```>```': Emit character tokens for '```<!>```'. Consume the |
| current character. Switch to the **data** state. |
| |
| |
| ### **Comment start 2** state ### |
| |
| If the current character is... |
| |
| * '```-```': Consume the character and switch to the **comment** |
| state. |
| |
| * '```>```': Emit character tokens for '```<!->```'. Consume the |
| current character. Switch to the **data** state. |
| |
| |
| ### **Comment** state ### |
| |
| If the current character is... |
| |
| * '```-```': Consume the character and switch to the **comment end 1** |
| state. |
| |
| * Anything else: Consume the character and switch to the **comment** |
| state. |
| |
| |
| ### **Comment end 1** state ### |
| |
| If the current character is... |
| |
| * '```-```': Consume the character, switch to the **comment end 2** |
| state. |
| |
| * Anything else: Consume the character, and switch to the **comment** |
| state. |
| |
| |
| ### **Comment end 2** state ### |
| |
| If the current character is... |
| |
| * '```>```': Consume the character and switch to the **data** state. |
| |
| * '```-```': Consume the character, but stay in this state. |
| |
| * Anything else: Consume the character, and switch to the **comment** |
| state. |
| |
| |
| ### **Character reference** state ### |
| |
| Let _raw value_ be the string '```&```'. |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```#```': Consume the character, and switch to the **numeric |
| character reference** state. |
| |
| * '```l```': Consume the character and switch to the **named character |
| reference L** state. |
| |
| * '```a```': Consume the character and switch to the **named character |
| reference A** state. |
| |
| * '```g```': Consume the character and switch to the **named character |
| reference G** state. |
| |
| * '```q```': Consume the character and switch to the **named character |
| reference Q** state. |
| |
| * Any other character in the range '```0```'..'```9```', |
| '```a```'..'```f```', '```A```'..'```F```': Consume the character |
| and switch to the **bad named character reference** state. |
| |
| * Anything else: Run the _emitting operation_ for all but the last |
| character in _raw value_, and switch to the **data state** without |
| consuming the current character. |
| |
| |
| ### **Numeric character reference** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```x```', '```X```': Let _value_ be zero, consume the character, |
| and switch to the **hexadecimal numeric character reference** state. |
| |
| * '```0```'..'```9```': Let _value_ be the numeric value of the |
| current character interpreted as a decimal digit, consume the |
| character, and switch to the **decimal numeric character reference** |
| state. |
| |
| * Anything else: Run the _emitting operation_ for all but the last |
| character in _raw value_, and switch to the **data state** without |
| consuming the current character. |
| |
| |
| ### **Hexadecimal numeric character reference** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```0```'..'```9```', '```a```'..'```f```', '```A```'..'```F```': |
| Let _value_ be sixteen times _value_ plus the numeric value of the |
| current character interpreted as a hexadecimal digit. |
| |
| * '```;```': Consume the character. If _value_ is between 0x0001 and |
| 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, |
| run the _emitting operation_ with a unicode character having the |
| scalar value _value_; otherwise, run the _emitting operation_ with |
| the character U+FFFD. Then, in either case, switch to the _return |
| state_. |
| |
| * Anything else: Run the _emitting operation_ for all but the last |
| character in _raw value_, and switch to the **data state** without |
| consuming the current character. |
| |
| |
| ### **Decimal numeric character reference** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```0```'..'```9```': Let _value_ be ten times _value_ plus the |
| numeric value of the current character interpreted as a decimal |
| digit. |
| |
| * '```;```': Consume the character. If _value_ is between 0x0001 and |
| 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, |
| run the _emitting operation_ with a unicode character having the |
| scalar value _value_; otherwise, run the _emitting operation_ with |
| the character U+FFFD. Then, in either case, switch to the _return |
| state_. |
| |
| * Anything else: Run the _emitting operation_ for all but the last |
| character in _raw value_, and switch to the **data state** without |
| consuming the current character. |
| |
| |
| ### **Named character reference L** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```t```': Let _character_ be '```<```', consume the current |
| character, and switch to the **after named character reference** |
| state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference A** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```p```': Consume the current character and switch to the **named |
| character reference AP** state. |
| |
| * '```m```': Consume the current character and switch to the **named |
| character reference AM** state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference AM** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```p```': Let _character_ be '```&```', consume the current |
| character, and switch to the **after named character reference** |
| state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference AP** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```o```': Consume the current character and switch to the **named |
| character reference APO** state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference APO** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```s```': Let _character_ be '```'```', consume the current |
| character, and switch to the **after named character reference** |
| state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference G** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```t```': Let _character_ be '```>```', consume the current |
| character, and switch to the **after named character reference** |
| state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference Q** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```u```': Consume the current character and switch to the **named |
| character reference QU** state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference QU** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```o```': Consume the current character and switch to the **named |
| character reference QUO** state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **Named character reference QUO** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```t```': Let _character_ be '```"```', consume the current |
| character, and switch to the **after named character reference** |
| state. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the character. |
| |
| |
| ### **After named character reference** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```;```': Consume the character. Run the _emitting operation_ with |
| the character _character_. Switch to the _return state_. |
| |
| * The _extra terminating character_: Run the _emitting operation_ with |
| the character U+FFFD. Switch to the _return state_ without consuming |
| the current character. |
| |
| * Anything else: Switch to the _bad named character reference_ state |
| without consuming the current character. |
| |
| |
| ### **Bad named character reference** state ### |
| |
| Append the current character to _raw value_. |
| |
| If the current character is... |
| |
| * '```;```': Consume the character. Run the _emitting operation_ with |
| the character U+FFFD. Switch to the _return state_. |
| |
| * The _extra terminating character_: Switch to the _return state_ |
| without consuming the current character. |
| |
| * Any other character in the range '```0```'..'```9```', |
| '```a```'..'```f```', '```A```'..'```F```': Consume the character |
| and stay in this state. |
| |
| * Anything else: Run the _emitting operation_ for all but the last |
| character in _raw value_, and switch to the **data state** without |
| consuming the current character. |
| |
| |
| Tree construction |
| ----------------- |
| |
| To construct a node tree from a _sequence of tokens_ and a document _document_: |
| |
| 1. Initialize the _stack of open nodes_ to be _document_. |
| 2. Consider each token _token_ in the _sequence of tokens_ in turn. |
| - If _token_ is a text token, |
| 1. Create a text node _node_ with character data _token.data_. |
| 2. Append _node_ to the top node in the _stack of open nodes_. |
| - If _token_ is a start tag token, |
| 1. Create an element _node_ with tag name _token.tagName_ and attributes |
| _token.attributes_. |
| 2. Append _node_ to the top node in the _stack of open nodes_. |
| 3. If the _token.selfClosing_ flag is not set, push _node_ onto the |
| _stack of open elements_. |
| 4. If _token.tagName_ is _script_, TODO: Execute the script. |
| - If _token_ is an end tag token, |
| 1. If the _stack of open nodes_ contains a node whose _tagName_ is |
| _token.tagName_, |
| - Pop nodes from the _stack of open nodes_ until a node with |
| a _tagName_ equal to _token.tagName_ has been popped. |
| 2. Otherwise, ignore _token_. |
| - If _token_ is a comment token, |
| 1. Ignore _token_. |
| - If _token_ is an EOF token, |
| 1. Pop all the nodes from the _stack of open nodes_. |
| 2. Signal _document_ that parsing is complete. |
| |
| TODO(ianh): <template>, <t> |