MensBeam/HTML-DOM

Author	SHA1	Message	Date
J. King	d33929f4a1	Change namespace; add copyright info	2021-03-21 17:38:05 -04:00
J. King	aaf85387be	Remove uses of is_null for consistency	2021-03-21 12:33:24 -04:00
J. King	82621a11e3	Sort out namespaced attributes	2021-03-18 12:40:54 -04:00
J. King	6cac402375	Minor cleanup	2021-03-16 14:42:21 -04:00
J. King	7f53465951	Fix remaining error positions	2021-03-13 21:00:59 -05:00
J. King	c6c51475cf	Convert tokenizer to generator Some error positions still need to be fixed	2021-03-13 18:03:15 -05:00
J. King	3f23040e1d	Fix most parse error counts More remain, though most have been addressed	2021-03-10 22:42:53 -05:00
J. King	01361efdb8	Various fixes	2021-03-06 21:41:12 -05:00
J. King	752ab05464	Implement rest of in-body insertion mode	2021-02-20 12:18:03 -05:00
J. King	a8d2ee4174	Fill out more of the "in body" insertion mode This only passes a few morectests because handling of end tags is still mostly missing	2021-02-19 20:18:13 -05:00
J. King	baaa00e544	Implement a in body Adoption agency will be handled later	2021-02-18 23:13:55 -05:00
J. King	6798c128e4	Correct unknown DOCTYPE checking	2021-02-14 19:33:23 -05:00
J. King	a8ff431370	Corrective pass over exising insertion modes	2021-02-14 15:09:00 -05:00
J. King	4e5fd35775	Fix a few tree tests	2021-02-12 23:26:57 -05:00
J. King	00bf9974c5	Fix up most error reporting positions	2019-12-19 22:28:11 -05:00
J. King	58a1177888	Address errors and omissions in error emission One test still fails, though it is arguably immaterial. This does not account for line and column number, which are known to be mostly off by one.	2019-12-19 15:13:20 -05:00
J. King	ec199f4f11	Report input stream errors	2019-12-18 21:10:18 -05:00
J. King	19fb541806	New from-scratch character reference consumer	2019-12-16 22:39:16 -05:00
J. King	43f380c1f9	Fix EOF and end tags - End tags now emit errors if they have attributes - End tags now emit errors if they are self-closing - The last character before EOF is now correctly reconsumed Also changed the tokenizer debug log to be zero-cost	2019-12-15 19:45:59 -05:00
J. King	d08438052a	Baseline pass over tokenizer - Implemented missing states (except entity and char ref states) - Re-copied and reformated most text from the specification - Emitted parse errors per spec (except invalid characters) - Properly handled null characters - Passed through invalid characters (these do not yet emit errors) - Added assertions before manipulation of tokens and temporary buffers - Removed problematic optimizations - Reoved explicit continue statements - Allowed end tags to have attributes - Simplified duplicate attribute detection - Corrected DOCTYPE properties not being "missing" - Skipped BOM in encoding-neutral way I may have introduced regressions, and the assertions are mostly serving to mask undefined-variable errors rather than helping to fix them, but at least warnings and notices are not being spammed this way. Work still need to be done in emitting errors for invalid characters (and invalid character sequences), also well as in consuming character references and entities correctly, not to mention general debugging.	2019-12-15 17:47:45 -05:00
Dustin Wilson	a0c3883363	Another infinite loop in Tokenizer caused by Data	2019-12-12 22:45:13 -06:00
J. King	6b42f08fbc	Change some if-the-exception blocks to assertions This has only been done some parts of the code that are internal to the parser at large.	2019-12-12 17:35:24 -05:00
J. King	bb2a7b5a95	Rewrite how parse errors are handled Everything which can emit a parse error should have the error handler and data stream as properties and use the ParseErrorEmitter trait to avoid complicating the task of actually producing an error. Normally the Parser would be expected to set the error handler before it begins (this commit does not do this) and unset it after it's done. Alternatively, the entire means of reporting errors can now be easily replaced.	2019-12-12 15:23:15 -05:00
J. King	8644b6c757	Explicitly index state names and error messages	2019-12-12 10:11:36 -05:00
J. King	51ac79128b	Multiple minor fixes	2019-12-11 23:28:32 -05:00
Dustin Wilson	30003fce1f	Fixed various issues with Data::consumeCharacterReference	2019-12-11 21:38:04 -06:00
Dustin Wilson	0624e0be93	Pushing forward on TreeBuilder • Updated mensbeam/intl dependency. • Moved scope methods from Element to OpenElementsStack. They don't need to be used outside of the parser and don't make sense there. • Cleaned up parse errors. Displaying what is expected or found is not helpful.	2018-09-19 09:09:36 -05:00
Dustin Wilson	33363ab2d3	Fixed Data bug • Fixed bug where Data::consumeWhile and Data::consumeUntil wouldn't move the pointer back one position if there were no matches. • Changed DataStream to Data. • Made each class have its own debug static property so each can print debug information separately.	2018-08-27 14:57:47 -05:00
Dustin Wilson	d95f3e37e4	Fixed document building • The document was being rewritten when tree building and therefore not being output when the parser completed. • Allowed DOM to be instanced, containing an implementation and document so the tree builder can create a document when a doctype is found.	2018-08-17 16:26:27 -05:00
Dustin Wilson	48d125e18a	Continuing work on TreeBuilder	2018-08-09 16:59:35 -05:00
Dustin Wilson	298decab24	Decouple ParseError from Parser	2018-08-03 23:08:18 -05:00
Dustin Wilson	222d60579c	Have Parser destroy its instance when finished • Getting ready to work on fragment parsing, simplifying Parser::parseFragment. • Added short example in README	2018-08-03 16:57:51 -05:00
Dustin Wilson	027e5b9f58	Moved tokenizer to its own class • Changed the name of the parser instance variable from Parser::$self to Parser::$instance • Added parse errors for entities into ParseError. • Moved Parser::fixDOM to DOM::fixIdAttributes. • Added an exception for when the tokenizer enters an invalid state (infinite looping). • Made ParseError use Parser::$instance->data instead of a passed around DataStream object.	2018-08-01 16:40:03 -05:00
Dustin Wilson	1fc65f85bd	Started HTML content tree building • Removed html5.php; shouldn't have been there to begin with. • Fixed bug where when feeding ParseError::trigger the wrong number of parameters it wouldn't have the correct exception to throw.	2018-07-26 16:30:29 -05:00
Dustin Wilson	de7cc7cbfa	Fixing foreign content stuff • Changes to the spec since the last edit required a rewrite of the tree building algorithm. • Searching the stack should search from reverse by default because the spec works that way. • Rewrote StartTagToken because the token attributes need to be easily editable as per the spec foreign attributes are edited before the token goes through the element creation process and not after. • Yes, there's a goto. Sue me.	2018-07-25 09:57:27 -05:00
Dustin Wilson	6f74630c98	Begin Implementation of Tree Builder • Added parsing instructions for tokens in foreign content	2018-04-08 10:46:30 -05:00
Dustin Wilson	a89f6c9f09	Beginning Rewrite	2018-03-21 10:55:32 -05:00

37 commits