MensBeam/HTML-Parser

Author	SHA1	Message	Date
J. King	37aecf97ba	Remove scripted encoding test workaround The test has been segregated, making the workaround unnecessary	2020-09-16 18:23:39 -04:00
J. King	f72809d621	Relax dependence on ctype	2019-12-24 09:38:42 -05:00
J. King	28f0bbfe72	Suppress only one scripting test	2019-12-23 10:17:07 -05:00
J. King	1f3c33ad9e	Better coverage of BOM-based detection	2019-12-23 09:13:08 -05:00
J. King	21c9377b3a	Docblock for BOM detection	2019-12-23 08:43:15 -05:00
J. King	06e43504d0	Tweaks	2019-12-22 23:38:15 -05:00
J. King	164e5ff1e8	Add standard charset detection tests - Various new tests needed for full coverage, noted in comment - Byte Order Mark detection methopd added - Japanese encodings nt yet supported, so tests marked incomplete - Tests requiring scripting suppressed	2019-12-22 22:51:18 -05:00
J. King	a7e1083681	Prototype character encoding detection	2019-12-22 13:36:59 -05:00
J. King	c1162f962f	Add missing test	2019-12-21 19:28:48 -05:00
J. King	2aa6bb2dea	Remove unnecessary test abstraction	2019-12-21 15:05:42 -05:00
J. King	49f31015ac	Start on character encoding detection	2019-12-21 14:53:51 -05:00
J. King	318d7bd7ad	Patch remaining test failures away	2019-12-20 11:48:14 -05:00
J. King	00bf9974c5	Fix up most error reporting positions	2019-12-19 22:28:11 -05:00
J. King	58a1177888	Address errors and omissions in error emission One test still fails, though it is arguably immaterial. This does not account for line and column number, which are known to be mostly off by one.	2019-12-19 15:13:20 -05:00
J. King	ec199f4f11	Report input stream errors	2019-12-18 21:10:18 -05:00
J. King	9560358021	Character consumption cleanup - Newline normalization now done on-the-fly - Consequently, original input string is used as-is - Byte order mark is not supposed to be skipped - Use more straightforward method of tracking column position - Simplify backtracking when spanning - Genericize character interpretation: this will be expanded to emit illegal-character parse errors when appropriate	2019-12-18 18:03:47 -05:00
J. King	1ed679c50d	Pass through surrogate characters This fixes the last four failing tests	2019-12-18 15:15:02 -05:00
J. King	5a12fa8ad7	Tidying	2019-12-17 17:08:19 -05:00
J. King	ff4447e986	Include pending spec changes tests	2019-12-17 13:58:54 -05:00
J. King	e8b3c76046	Fix most failures Also removed assertions	2019-12-17 13:47:53 -05:00
J. King	59456b078f	Fix consuming of overlong entitiy	2019-12-17 12:32:29 -05:00
J. King	e8f35e92fb	Character reference fixes One test in the "entities.test" file is till failing	2019-12-16 23:41:44 -05:00
J. King	b9b892e6a6	Remove obsolete character reference consumer	2019-12-16 22:56:47 -05:00
J. King	19fb541806	New from-scratch character reference consumer	2019-12-16 22:39:16 -05:00
J. King	67c7f382e2	Prep for character references - Add missing state constants - Break all existing deviations for character refs - Add assertions before use of $attribute - Also fix DOCTYPE state	2019-12-15 22:20:20 -05:00
J. King	d4a7280405	Renumber states to match specification sections	2019-12-15 21:22:45 -05:00
J. King	4759f94771	Trim whitespace	2019-12-15 21:14:55 -05:00
J. King	cf41984e88	Fix comment end state	2019-12-15 21:13:10 -05:00
J. King	43f380c1f9	Fix EOF and end tags - End tags now emit errors if they have attributes - End tags now emit errors if they are self-closing - The last character before EOF is now correctly reconsumed Also changed the tokenizer debug log to be zero-cost	2019-12-15 19:45:59 -05:00
J. King	d08438052a	Baseline pass over tokenizer - Implemented missing states (except entity and char ref states) - Re-copied and reformated most text from the specification - Emitted parse errors per spec (except invalid characters) - Properly handled null characters - Passed through invalid characters (these do not yet emit errors) - Added assertions before manipulation of tokens and temporary buffers - Removed problematic optimizations - Reoved explicit continue statements - Allowed end tags to have attributes - Simplified duplicate attribute detection - Corrected DOCTYPE properties not being "missing" - Skipped BOM in encoding-neutral way I may have introduced regressions, and the assertions are mostly serving to mask undefined-variable errors rather than helping to fix them, but at least warnings and notices are not being spammed this way. Work still need to be done in emitting errors for invalid characters (and invalid character sequences), also well as in consuming character references and entities correctly, not to mention general debugging.	2019-12-15 17:47:45 -05:00
J. King	4e4aee2edd	Update intl dependency	2019-12-13 12:13:44 -05:00
Dustin Wilson	a0c3883363	Another infinite loop in Tokenizer caused by Data	2019-12-12 22:45:13 -06:00
J. King	49820afe7d	Fix broken assertion	2019-12-12 21:50:44 -05:00
J. King	362bb00158	Fix accidental exception instantiation loop	2019-12-12 21:06:05 -05:00
J. King	3c7a76bce1	Retrofit tree builder for new error emitter Also fixed a number of undefined variable errors and erroneous non-root namespace references	2019-12-12 18:10:01 -05:00
J. King	6b42f08fbc	Change some if-the-exception blocks to assertions This has only been done some parts of the code that are internal to the parser at large.	2019-12-12 17:35:24 -05:00
J. King	af57117c23	Silence parse errors for now	2019-12-12 15:43:16 -05:00
J. King	bb2a7b5a95	Rewrite how parse errors are handled Everything which can emit a parse error should have the error handler and data stream as properties and use the ParseErrorEmitter trait to avoid complicating the task of actually producing an error. Normally the Parser would be expected to set the error handler before it begins (this commit does not do this) and unset it after it's done. Alternatively, the entire means of reporting errors can now be easily replaced.	2019-12-12 15:23:15 -05:00
J. King	d93fe25e58	Combine character tokens in test harness	2019-12-12 10:48:11 -05:00
J. King	8644b6c757	Explicitly index state names and error messages	2019-12-12 10:11:36 -05:00
J. King	51ac79128b	Multiple minor fixes	2019-12-11 23:28:32 -05:00
Dustin Wilson	30003fce1f	Fixed various issues with Data::consumeCharacterReference	2019-12-11 21:38:04 -06:00
J. King	5fb58054ff	Don't read past the end of $args	2019-12-11 20:49:24 -05:00
J. King	1beb934789	Add more tests	2019-12-11 20:49:24 -05:00
Dustin Wilson	ab507a177f	Data::consumeCharacterReference checked for false instead of empty string	2019-12-11 19:44:45 -06:00
J. King	c5a300655c	Ensure test data are present	2019-12-11 11:57:05 -05:00
J. King	223562e035	Fix known errors so far Parse error class checks for correct number of arguments to its emitter and does seem to have correctly complained about the empty tag errors Duplicate attributes should not be checked if the token is an end tag Finally, used null coalescing to silence an undefined variable	2019-12-11 00:18:39 -05:00
Dustin Wilson	64d8a2ab2c	Fixed infinite loop caused by Data::consumeWhile and consumeUntil	2019-12-10 22:48:02 -06:00
J. King	f360206a34	Basic endless loop helper	2019-12-10 23:20:50 -05:00
J. King	1386eb103c	Fix test transformer	2019-12-10 21:35:27 -05:00

1 2 3 4 5

206 commits