32f4cca039
Type hints for tree builder properties
2021-02-14 23:39:17 -05:00
5f1f02b552
Skip tests requiring unimplemented logic
2021-02-14 21:57:44 -05:00
8e7a0f6284
Clean up foreign content case normalization
2021-02-14 20:28:37 -05:00
6798c128e4
Correct unknown DOCTYPE checking
2021-02-14 19:33:23 -05:00
29fd5e2ccb
Fix invalid property accesses
2021-02-14 18:50:24 -05:00
1dc3d9c23e
Emit whitespace-only character tokens
...
This makes tree building simpler in certain circumstances
2021-02-14 18:31:16 -05:00
504731cba0
Bring coverage backend up to date
2021-02-14 18:25:35 -05:00
9c3764da65
Stub of adoption agency
2021-02-14 15:44:05 -05:00
a8ff431370
Corrective pass over exising insertion modes
2021-02-14 15:09:00 -05:00
f8b9cf2c2b
Avoid implicit looping and switching
...
The while loop has been replaced with gotos where appropriate, and
switching has been replaced with a series of if-blocks in line with the
same logic in the tokenizer.
2021-02-13 14:37:04 -05:00
e3a271f06b
Fix first failure in tree builder
2021-02-13 10:29:07 -05:00
1fa2f701cb
Update section of tokenization in spec comments
2021-02-13 10:14:26 -05:00
4e5fd35775
Fix a few tree tests
2021-02-12 23:26:57 -05:00
bb4002abcb
Stub the tree builder properly
2021-02-12 22:46:10 -05:00
eea70eccd8
Test harness for tree construction
2021-02-12 21:05:48 -05:00
a35e8c8ae5
Update character decoders
2021-02-12 09:51:30 -05:00
ad0a8ae27a
Replace Content-Type parser with proper version
2020-09-25 12:34:53 -04:00
596a58eff1
Update tooling
2020-09-25 12:22:16 -04:00
0056e6cbc6
Support PCOV for coverage
2020-09-20 09:30:25 -04:00
4e79f378a8
Fix bug uncovered by new tests
2020-09-17 09:12:14 -04:00
269d0ecc64
Patch tests based on input not unstable identifier
2020-09-17 09:10:32 -04:00
37aecf97ba
Remove scripted encoding test workaround
...
The test has been segregated, making the workaround unnecessary
2020-09-16 18:23:39 -04:00
f72809d621
Relax dependence on ctype
2019-12-24 09:38:42 -05:00
28f0bbfe72
Suppress only one scripting test
2019-12-23 10:17:07 -05:00
1f3c33ad9e
Better coverage of BOM-based detection
2019-12-23 09:13:08 -05:00
21c9377b3a
Docblock for BOM detection
2019-12-23 08:43:15 -05:00
06e43504d0
Tweaks
2019-12-22 23:38:15 -05:00
164e5ff1e8
Add standard charset detection tests
...
- Various new tests needed for full coverage, noted in comment
- Byte Order Mark detection methopd added
- Japanese encodings nt yet supported, so tests marked incomplete
- Tests requiring scripting suppressed
2019-12-22 22:51:18 -05:00
a7e1083681
Prototype character encoding detection
2019-12-22 13:36:59 -05:00
c1162f962f
Add missing test
2019-12-21 19:28:48 -05:00
2aa6bb2dea
Remove unnecessary test abstraction
2019-12-21 15:05:42 -05:00
49f31015ac
Start on character encoding detection
2019-12-21 14:53:51 -05:00
318d7bd7ad
Patch remaining test failures away
2019-12-20 11:48:14 -05:00
00bf9974c5
Fix up most error reporting positions
2019-12-19 22:28:11 -05:00
58a1177888
Address errors and omissions in error emission
...
One test still fails, though it is arguably immaterial. This does not
account for line and column number, which are known to be mostly
off by one.
2019-12-19 15:13:20 -05:00
ec199f4f11
Report input stream errors
2019-12-18 21:10:18 -05:00
9560358021
Character consumption cleanup
...
- Newline normalization now done on-the-fly
- Consequently, original input string is used as-is
- Byte order mark is not supposed to be skipped
- Use more straightforward method of tracking column position
- Simplify backtracking when spanning
- Genericize character interpretation: this will be expanded to emit
illegal-character parse errors when appropriate
2019-12-18 18:03:47 -05:00
1ed679c50d
Pass through surrogate characters
...
This fixes the last four failing tests
2019-12-18 15:15:02 -05:00
5a12fa8ad7
Tidying
2019-12-17 17:08:19 -05:00
ff4447e986
Include pending spec changes tests
2019-12-17 13:58:54 -05:00
e8b3c76046
Fix most failures
...
Also removed assertions
2019-12-17 13:47:53 -05:00
59456b078f
Fix consuming of overlong entitiy
2019-12-17 12:32:29 -05:00
e8f35e92fb
Character reference fixes
...
One test in the "entities.test" file is till failing
2019-12-16 23:41:44 -05:00
b9b892e6a6
Remove obsolete character reference consumer
2019-12-16 22:56:47 -05:00
19fb541806
New from-scratch character reference consumer
2019-12-16 22:39:16 -05:00
67c7f382e2
Prep for character references
...
- Add missing state constants
- Break all existing deviations for character refs
- Add assertions before use of $attribute
- Also fix DOCTYPE state
2019-12-15 22:20:20 -05:00
d4a7280405
Renumber states to match specification sections
2019-12-15 21:22:45 -05:00
4759f94771
Trim whitespace
2019-12-15 21:14:55 -05:00
cf41984e88
Fix comment end state
2019-12-15 21:13:10 -05:00
43f380c1f9
Fix EOF and end tags
...
- End tags now emit errors if they have attributes
- End tags now emit errors if they are self-closing
- The last character before EOF is now correctly reconsumed
Also changed the tokenizer debug log to be zero-cost
2019-12-15 19:45:59 -05:00