MensBeam/HTML-Parser

Author	SHA1	Message	Date
J. King	32f4cca039	Type hints for tree builder properties	2021-02-14 23:39:17 -05:00
J. King	5f1f02b552	Skip tests requiring unimplemented logic	2021-02-14 21:57:44 -05:00
J. King	8e7a0f6284	Clean up foreign content case normalization	2021-02-14 20:28:37 -05:00
J. King	6798c128e4	Correct unknown DOCTYPE checking	2021-02-14 19:33:23 -05:00
J. King	29fd5e2ccb	Fix invalid property accesses	2021-02-14 18:50:24 -05:00
J. King	1dc3d9c23e	Emit whitespace-only character tokens This makes tree building simpler in certain circumstances	2021-02-14 18:31:16 -05:00
J. King	504731cba0	Bring coverage backend up to date	2021-02-14 18:25:35 -05:00
J. King	9c3764da65	Stub of adoption agency	2021-02-14 15:44:05 -05:00
J. King	a8ff431370	Corrective pass over exising insertion modes	2021-02-14 15:09:00 -05:00
J. King	f8b9cf2c2b	Avoid implicit looping and switching The while loop has been replaced with gotos where appropriate, and switching has been replaced with a series of if-blocks in line with the same logic in the tokenizer.	2021-02-13 14:37:04 -05:00
J. King	e3a271f06b	Fix first failure in tree builder	2021-02-13 10:29:07 -05:00
J. King	1fa2f701cb	Update section of tokenization in spec comments	2021-02-13 10:14:26 -05:00
J. King	4e5fd35775	Fix a few tree tests	2021-02-12 23:26:57 -05:00
J. King	bb4002abcb	Stub the tree builder properly	2021-02-12 22:46:10 -05:00
J. King	eea70eccd8	Test harness for tree construction	2021-02-12 21:05:48 -05:00
J. King	a35e8c8ae5	Update character decoders	2021-02-12 09:51:30 -05:00
J. King	ad0a8ae27a	Replace Content-Type parser with proper version	2020-09-25 12:34:53 -04:00
J. King	596a58eff1	Update tooling	2020-09-25 12:22:16 -04:00
J. King	0056e6cbc6	Support PCOV for coverage	2020-09-20 09:30:25 -04:00
J. King	4e79f378a8	Fix bug uncovered by new tests	2020-09-17 09:12:14 -04:00
J. King	269d0ecc64	Patch tests based on input not unstable identifier	2020-09-17 09:10:32 -04:00
J. King	37aecf97ba	Remove scripted encoding test workaround The test has been segregated, making the workaround unnecessary	2020-09-16 18:23:39 -04:00
J. King	f72809d621	Relax dependence on ctype	2019-12-24 09:38:42 -05:00
J. King	28f0bbfe72	Suppress only one scripting test	2019-12-23 10:17:07 -05:00
J. King	1f3c33ad9e	Better coverage of BOM-based detection	2019-12-23 09:13:08 -05:00
J. King	21c9377b3a	Docblock for BOM detection	2019-12-23 08:43:15 -05:00
J. King	06e43504d0	Tweaks	2019-12-22 23:38:15 -05:00
J. King	164e5ff1e8	Add standard charset detection tests - Various new tests needed for full coverage, noted in comment - Byte Order Mark detection methopd added - Japanese encodings nt yet supported, so tests marked incomplete - Tests requiring scripting suppressed	2019-12-22 22:51:18 -05:00
J. King	a7e1083681	Prototype character encoding detection	2019-12-22 13:36:59 -05:00
J. King	c1162f962f	Add missing test	2019-12-21 19:28:48 -05:00
J. King	2aa6bb2dea	Remove unnecessary test abstraction	2019-12-21 15:05:42 -05:00
J. King	49f31015ac	Start on character encoding detection	2019-12-21 14:53:51 -05:00
J. King	318d7bd7ad	Patch remaining test failures away	2019-12-20 11:48:14 -05:00
J. King	00bf9974c5	Fix up most error reporting positions	2019-12-19 22:28:11 -05:00
J. King	58a1177888	Address errors and omissions in error emission One test still fails, though it is arguably immaterial. This does not account for line and column number, which are known to be mostly off by one.	2019-12-19 15:13:20 -05:00
J. King	ec199f4f11	Report input stream errors	2019-12-18 21:10:18 -05:00
J. King	9560358021	Character consumption cleanup - Newline normalization now done on-the-fly - Consequently, original input string is used as-is - Byte order mark is not supposed to be skipped - Use more straightforward method of tracking column position - Simplify backtracking when spanning - Genericize character interpretation: this will be expanded to emit illegal-character parse errors when appropriate	2019-12-18 18:03:47 -05:00
J. King	1ed679c50d	Pass through surrogate characters This fixes the last four failing tests	2019-12-18 15:15:02 -05:00
J. King	5a12fa8ad7	Tidying	2019-12-17 17:08:19 -05:00
J. King	ff4447e986	Include pending spec changes tests	2019-12-17 13:58:54 -05:00
J. King	e8b3c76046	Fix most failures Also removed assertions	2019-12-17 13:47:53 -05:00
J. King	59456b078f	Fix consuming of overlong entitiy	2019-12-17 12:32:29 -05:00
J. King	e8f35e92fb	Character reference fixes One test in the "entities.test" file is till failing	2019-12-16 23:41:44 -05:00
J. King	b9b892e6a6	Remove obsolete character reference consumer	2019-12-16 22:56:47 -05:00
J. King	19fb541806	New from-scratch character reference consumer	2019-12-16 22:39:16 -05:00
J. King	67c7f382e2	Prep for character references - Add missing state constants - Break all existing deviations for character refs - Add assertions before use of $attribute - Also fix DOCTYPE state	2019-12-15 22:20:20 -05:00
J. King	d4a7280405	Renumber states to match specification sections	2019-12-15 21:22:45 -05:00
J. King	4759f94771	Trim whitespace	2019-12-15 21:14:55 -05:00
J. King	cf41984e88	Fix comment end state	2019-12-15 21:13:10 -05:00
J. King	43f380c1f9	Fix EOF and end tags - End tags now emit errors if they have attributes - End tags now emit errors if they are self-closing - The last character before EOF is now correctly reconsumed Also changed the tokenizer debug log to be zero-cost	2019-12-15 19:45:59 -05:00

1 2 3

127 commits