- End tags now emit errors if they have attributes
- End tags now emit errors if they are self-closing
- The last character before EOF is now correctly reconsumed
Also changed the tokenizer debug log to be zero-cost
- Implemented missing states (except entity and char ref states)
- Re-copied and reformated most text from the specification
- Emitted parse errors per spec (except invalid characters)
- Properly handled null characters
- Passed through invalid characters (these do not yet emit errors)
- Added assertions before manipulation of tokens and temporary buffers
- Removed problematic optimizations
- Reoved explicit continue statements
- Allowed end tags to have attributes
- Simplified duplicate attribute detection
- Corrected DOCTYPE properties not being "missing"
- Skipped BOM in encoding-neutral way
I may have introduced regressions, and the assertions are mostly serving to
mask undefined-variable errors rather than helping to fix them, but at least
warnings and notices are not being spammed this way.
Work still need to be done in emitting errors for invalid characters (and
invalid character sequences), also well as in consuming character
references and entities correctly, not to mention general debugging.
Everything which can emit a parse error should have the error handler
and data stream as properties and use the ParseErrorEmitter trait to
avoid complicating the task of actually producing an error.
Normally the Parser would be expected to set the error handler before it
begins (this commit does not do this) and unset it after it's done.
Alternatively, the entire means of reporting errors can now be easily
replaced.
• Updated mensbeam/intl dependency.
• Moved scope methods from Element to OpenElementsStack. They don't need to be used outside of the parser and don't make sense there.
• Cleaned up parse errors. Displaying what is expected or found is not helpful.
• Fixed bug where Data::consumeWhile and Data::consumeUntil wouldn't move the pointer back one position if there were no matches.
• Changed DataStream to Data.
• Made each class have its own debug static property so each can print debug information separately.
• The document was being rewritten when tree building and therefore not being output when the parser completed.
• Allowed DOM to be instanced, containing an implementation and document so the tree builder can create a document when a doctype is found.
• Changed the name of the parser instance variable from Parser::$self to Parser::$instance
• Added parse errors for entities into ParseError.
• Moved Parser::fixDOM to DOM::fixIdAttributes.
• Added an exception for when the tokenizer enters an invalid state (infinite looping).
• Made ParseError use Parser::$instance->data instead of a passed around DataStream object.
• Removed html5.php; shouldn't have been there to begin with.
• Fixed bug where when feeding ParseError::trigger the wrong number of parameters it wouldn't have the correct exception to throw.
• Changes to the spec since the last edit required a rewrite of the tree building algorithm.
• Searching the stack should search from reverse by default because the spec works that way.
• Rewrote StartTagToken because the token attributes need to be easily editable as per the spec foreign attributes are edited before the token goes through the element creation process and not after.
• Yes, there's a goto. Sue me.