This uncovered three bugs:
- Unconsuming at EOF with error reporting off yielded incorrect tokens
- Mangled end tags in foreign content were not being matched correctly
- Broken characters produced a fatal error with error reporting on
- Various new tests needed for full coverage, noted in comment
- Byte Order Mark detection methopd added
- Japanese encodings nt yet supported, so tests marked incomplete
- Tests requiring scripting suppressed