HTML-Parser

Commit Graph

Author	SHA1	Message	Date
J. King	e8b3c76046	Fix most failures Also removed assertions	5 years ago
J. King	59456b078f	Fix consuming of overlong entitiy	5 years ago
J. King	e8f35e92fb	Character reference fixes One test in the "entities.test" file is till failing	5 years ago
J. King	b9b892e6a6	Remove obsolete character reference consumer	5 years ago
J. King	19fb541806	New from-scratch character reference consumer	5 years ago
J. King	67c7f382e2	Prep for character references - Add missing state constants - Break all existing deviations for character refs - Add assertions before use of $attribute - Also fix DOCTYPE state	5 years ago
J. King	d4a7280405	Renumber states to match specification sections	5 years ago
J. King	4759f94771	Trim whitespace	5 years ago
J. King	cf41984e88	Fix comment end state	5 years ago
J. King	43f380c1f9	Fix EOF and end tags - End tags now emit errors if they have attributes - End tags now emit errors if they are self-closing - The last character before EOF is now correctly reconsumed Also changed the tokenizer debug log to be zero-cost	5 years ago
J. King	d08438052a	Baseline pass over tokenizer - Implemented missing states (except entity and char ref states) - Re-copied and reformated most text from the specification - Emitted parse errors per spec (except invalid characters) - Properly handled null characters - Passed through invalid characters (these do not yet emit errors) - Added assertions before manipulation of tokens and temporary buffers - Removed problematic optimizations - Reoved explicit continue statements - Allowed end tags to have attributes - Simplified duplicate attribute detection - Corrected DOCTYPE properties not being "missing" - Skipped BOM in encoding-neutral way I may have introduced regressions, and the assertions are mostly serving to mask undefined-variable errors rather than helping to fix them, but at least warnings and notices are not being spammed this way. Work still need to be done in emitting errors for invalid characters (and invalid character sequences), also well as in consuming character references and entities correctly, not to mention general debugging.	5 years ago
J. King	4e4aee2edd	Update intl dependency	5 years ago
Dustin Wilson	a0c3883363	Another infinite loop in Tokenizer caused by Data	5 years ago
J. King	49820afe7d	Fix broken assertion	5 years ago
J. King	362bb00158	Fix accidental exception instantiation loop	5 years ago
J. King	3c7a76bce1	Retrofit tree builder for new error emitter Also fixed a number of undefined variable errors and erroneous non-root namespace references	5 years ago
J. King	6b42f08fbc	Change some if-the-exception blocks to assertions This has only been done some parts of the code that are internal to the parser at large.	5 years ago
J. King	af57117c23	Silence parse errors for now	5 years ago
J. King	bb2a7b5a95	Rewrite how parse errors are handled Everything which can emit a parse error should have the error handler and data stream as properties and use the ParseErrorEmitter trait to avoid complicating the task of actually producing an error. Normally the Parser would be expected to set the error handler before it begins (this commit does not do this) and unset it after it's done. Alternatively, the entire means of reporting errors can now be easily replaced.	5 years ago
J. King	d93fe25e58	Combine character tokens in test harness	5 years ago
J. King	8644b6c757	Explicitly index state names and error messages	5 years ago
J. King	51ac79128b	Multiple minor fixes	5 years ago
Dustin Wilson	30003fce1f	Fixed various issues with Data::consumeCharacterReference	5 years ago
J. King	5fb58054ff	Don't read past the end of $args	5 years ago
J. King	1beb934789	Add more tests	5 years ago
Dustin Wilson	ab507a177f	Data::consumeCharacterReference checked for false instead of empty string	5 years ago
J. King	c5a300655c	Ensure test data are present	5 years ago
J. King	223562e035	Fix known errors so far Parse error class checks for correct number of arguments to its emitter and does seem to have correctly complained about the empty tag errors Duplicate attributes should not be checked if the token is an end tag Finally, used null coalescing to silence an undefined variable	5 years ago
Dustin Wilson	64d8a2ab2c	Fixed infinite loop caused by Data::consumeWhile and consumeUntil	5 years ago
J. King	f360206a34	Basic endless loop helper	5 years ago
J. King	1386eb103c	Fix test transformer	5 years ago
J. King	1971892635	Basic skeleton of test suite	5 years ago
Dustin Wilson	205c56679a	TreeBuilder progress	5 years ago
Dustin Wilson	08ebc11ce7	More TreeBuilder stuff	6 years ago
Dustin Wilson	b4c3c08800	Another daily TreeBuilder	6 years ago
Dustin Wilson	0624e0be93	Pushing forward on TreeBuilder • Updated mensbeam/intl dependency. • Moved scope methods from Element to OpenElementsStack. They don't need to be used outside of the parser and don't make sense there. • Cleaned up parse errors. Displaying what is expected or found is not helpful.	6 years ago
Dustin Wilson	6e7145c022	More daily TreeBuilder stuff	6 years ago
Dustin Wilson	8e87149419	Daily TreeBuilder additions	6 years ago
Dustin Wilson	f5e6179fad	Daily updates to token emitting	6 years ago
Dustin Wilson	63fe744864	Printing 👌🏻 Made each of the node types serialize themselves through __toString with serialize used in Document, DocumentFragment, and Element.	6 years ago
Dustin Wilson	ab1a78c192	Modifying printing • Added exceptions for disabled inherited DOMDocument methods that don't make sense in an HTML5 library. • Moved the inherited methods from the Printing trait to Document as DocumentFragment and Element don't inherit anything to print.	6 years ago
Dustin Wilson	dfda8d5f3a	Organization • Moved the ancestor and descendant methods into their own traits along with the compare method which they share. • Made DocumentFragment use only the descendant methods and not the ancestor ones. • Fixed error in README.	6 years ago
Dustin Wilson	fd6003fb4e	Added additional entry point • Added an option to use Document::loadHTML or Document::load to parse a document. • Made the DOM elements use dW\HTML5 namespace instead of dW\HTML5\DOM. • Fixed where TreeBuilder wasn't being properly destructed when the parser is finished.	6 years ago
Dustin Wilson	86c6577752	Forgotten fixes	6 years ago
Dustin Wilson	5af174d346	Rewrote DOM Tools • Discovered a way to extend the existing DOM tools, so Parser now works with custom classes for the DOM. • \dW\HTML5\DOM is no longer needed, and its functionality has been split between DOM\Document and DOM\Element. • Printer's functionality is now handled by DOM\Document\saveHTML through a trait to make maintenance easier. • Normalized Exception constants. • Fixed various typing bugs.	6 years ago
Dustin Wilson	69fa554d43	Updated \mensbeam\intl dependency	6 years ago
Dustin Wilson	f948e3ffad	Fix Printer bugs • Oops. Didn't have the \ in front of the DOM classes. • Forgot function keyword on Printer::escapeString. • Finished out the spec on Printer::serialize.	6 years ago
Dustin Wilson	d6dada9c5e	Printer progress	6 years ago
Dustin Wilson	c5631050cc	Started Printer	6 years ago
Dustin Wilson	66ec4dab27	Fix character reference parsing	6 years ago

1 2

87 Commits (e8b3c76046d6cebcbf79d6591f5b0b0b6eeaf711) All Branches Search

87 Commits (e8b3c76046d6cebcbf79d6591f5b0b0b6eeaf711)

All Branches