- End tags now emit errors if they have attributes
- End tags now emit errors if they are self-closing
- The last character before EOF is now correctly reconsumed
Also changed the tokenizer debug log to be zero-cost
- Implemented missing states (except entity and char ref states)
- Re-copied and reformated most text from the specification
- Emitted parse errors per spec (except invalid characters)
- Properly handled null characters
- Passed through invalid characters (these do not yet emit errors)
- Added assertions before manipulation of tokens and temporary buffers
- Removed problematic optimizations
- Reoved explicit continue statements
- Allowed end tags to have attributes
- Simplified duplicate attribute detection
- Corrected DOCTYPE properties not being "missing"
- Skipped BOM in encoding-neutral way
I may have introduced regressions, and the assertions are mostly serving to
mask undefined-variable errors rather than helping to fix them, but at least
warnings and notices are not being spammed this way.
Work still need to be done in emitting errors for invalid characters (and
invalid character sequences), also well as in consuming character
references and entities correctly, not to mention general debugging.
Everything which can emit a parse error should have the error handler
and data stream as properties and use the ParseErrorEmitter trait to
avoid complicating the task of actually producing an error.
Normally the Parser would be expected to set the error handler before it
begins (this commit does not do this) and unset it after it's done.
Alternatively, the entire means of reporting errors can now be easily
replaced.
Parse error class checks for correct number of arguments to its emitter
and does seem to have correctly complained about the empty tag errors
Duplicate attributes should not be checked if the token is an end tag
Finally, used null coalescing to silence an undefined variable
• Updated mensbeam/intl dependency.
• Moved scope methods from Element to OpenElementsStack. They don't need to be used outside of the parser and don't make sense there.
• Cleaned up parse errors. Displaying what is expected or found is not helpful.
• Added exceptions for disabled inherited DOMDocument methods that don't make sense in an HTML5 library.
• Moved the inherited methods from the Printing trait to Document as DocumentFragment and Element don't inherit anything to print.
• Moved the ancestor and descendant methods into their own traits along with the compare method which they share.
• Made DocumentFragment use only the descendant methods and not the ancestor ones.
• Fixed error in README.
• Added an option to use Document::loadHTML or Document::load to parse a document.
• Made the DOM elements use dW\HTML5 namespace instead of dW\HTML5\DOM.
• Fixed where TreeBuilder wasn't being properly destructed when the parser is finished.
• Discovered a way to extend the existing DOM tools, so Parser now works with custom classes for the DOM.
• \dW\HTML5\DOM is no longer needed, and its functionality has been split between DOM\Document and DOM\Element.
• Printer's functionality is now handled by DOM\Document\saveHTML through a trait to make maintenance easier.
• Normalized Exception constants.
• Fixed various typing bugs.
• Oops. Didn't have the \ in front of the DOM classes.
• Forgot function keyword on Printer::escapeString.
• Finished out the spec on Printer::serialize.