Commit graph

127 commits

Author SHA1 Message Date
32f4cca039 Type hints for tree builder properties 2021-02-14 23:39:17 -05:00
5f1f02b552 Skip tests requiring unimplemented logic 2021-02-14 21:57:44 -05:00
8e7a0f6284 Clean up foreign content case normalization 2021-02-14 20:28:37 -05:00
6798c128e4 Correct unknown DOCTYPE checking 2021-02-14 19:33:23 -05:00
29fd5e2ccb Fix invalid property accesses 2021-02-14 18:50:24 -05:00
1dc3d9c23e Emit whitespace-only character tokens
This makes tree building simpler in certain circumstances
2021-02-14 18:31:16 -05:00
504731cba0 Bring coverage backend up to date 2021-02-14 18:25:35 -05:00
9c3764da65 Stub of adoption agency 2021-02-14 15:44:05 -05:00
a8ff431370 Corrective pass over exising insertion modes 2021-02-14 15:09:00 -05:00
f8b9cf2c2b Avoid implicit looping and switching
The while loop has been replaced with gotos where appropriate, and
switching has been replaced with a series of if-blocks in line with the
same logic in the tokenizer.
2021-02-13 14:37:04 -05:00
e3a271f06b Fix first failure in tree builder 2021-02-13 10:29:07 -05:00
1fa2f701cb Update section of tokenization in spec comments 2021-02-13 10:14:26 -05:00
4e5fd35775 Fix a few tree tests 2021-02-12 23:26:57 -05:00
bb4002abcb Stub the tree builder properly 2021-02-12 22:46:10 -05:00
eea70eccd8 Test harness for tree construction 2021-02-12 21:05:48 -05:00
a35e8c8ae5 Update character decoders 2021-02-12 09:51:30 -05:00
ad0a8ae27a Replace Content-Type parser with proper version 2020-09-25 12:34:53 -04:00
596a58eff1 Update tooling 2020-09-25 12:22:16 -04:00
0056e6cbc6 Support PCOV for coverage 2020-09-20 09:30:25 -04:00
4e79f378a8 Fix bug uncovered by new tests 2020-09-17 09:12:14 -04:00
269d0ecc64 Patch tests based on input not unstable identifier 2020-09-17 09:10:32 -04:00
37aecf97ba Remove scripted encoding test workaround
The test has been segregated, making the workaround unnecessary
2020-09-16 18:23:39 -04:00
f72809d621 Relax dependence on ctype 2019-12-24 09:38:42 -05:00
28f0bbfe72 Suppress only one scripting test 2019-12-23 10:17:07 -05:00
1f3c33ad9e Better coverage of BOM-based detection 2019-12-23 09:13:08 -05:00
21c9377b3a Docblock for BOM detection 2019-12-23 08:43:15 -05:00
06e43504d0 Tweaks 2019-12-22 23:38:15 -05:00
164e5ff1e8 Add standard charset detection tests
- Various new tests needed for full coverage, noted in comment
- Byte Order Mark detection methopd added
- Japanese encodings nt yet supported, so tests marked incomplete
- Tests requiring scripting suppressed
2019-12-22 22:51:18 -05:00
a7e1083681 Prototype character encoding detection 2019-12-22 13:36:59 -05:00
c1162f962f Add missing test 2019-12-21 19:28:48 -05:00
2aa6bb2dea Remove unnecessary test abstraction 2019-12-21 15:05:42 -05:00
49f31015ac Start on character encoding detection 2019-12-21 14:53:51 -05:00
318d7bd7ad Patch remaining test failures away 2019-12-20 11:48:14 -05:00
00bf9974c5 Fix up most error reporting positions 2019-12-19 22:28:11 -05:00
58a1177888 Address errors and omissions in error emission
One test still fails, though it is arguably immaterial. This does not
account for line and column number, which are known to be mostly
off by one.
2019-12-19 15:13:20 -05:00
ec199f4f11 Report input stream errors 2019-12-18 21:10:18 -05:00
9560358021 Character consumption cleanup
- Newline normalization now done on-the-fly
- Consequently, original input string is used as-is
- Byte order mark is not supposed to be skipped
- Use more straightforward method of tracking column position
- Simplify backtracking when spanning
- Genericize character interpretation: this will be expanded to emit
illegal-character parse errors when appropriate
2019-12-18 18:03:47 -05:00
1ed679c50d Pass through surrogate characters
This fixes the last four failing tests
2019-12-18 15:15:02 -05:00
5a12fa8ad7 Tidying 2019-12-17 17:08:19 -05:00
ff4447e986 Include pending spec changes tests 2019-12-17 13:58:54 -05:00
e8b3c76046 Fix most failures
Also removed assertions
2019-12-17 13:47:53 -05:00
59456b078f Fix consuming of overlong entitiy 2019-12-17 12:32:29 -05:00
e8f35e92fb Character reference fixes
One test in the "entities.test" file is till failing
2019-12-16 23:41:44 -05:00
b9b892e6a6 Remove obsolete character reference consumer 2019-12-16 22:56:47 -05:00
19fb541806 New from-scratch character reference consumer 2019-12-16 22:39:16 -05:00
67c7f382e2 Prep for character references
- Add missing state constants
- Break all existing deviations for character refs
- Add assertions before use of $attribute
- Also fix DOCTYPE state
2019-12-15 22:20:20 -05:00
d4a7280405 Renumber states to match specification sections 2019-12-15 21:22:45 -05:00
4759f94771 Trim whitespace 2019-12-15 21:14:55 -05:00
cf41984e88 Fix comment end state 2019-12-15 21:13:10 -05:00
43f380c1f9 Fix EOF and end tags
- End tags now emit errors if they have attributes
- End tags now emit errors if they are self-closing
- The last character before EOF is now correctly reconsumed

Also changed the tokenizer debug log to be zero-cost
2019-12-15 19:45:59 -05:00