J. King
a441fc6a95
CS fix
6 years ago
J. King
3698aa8d8d
Tweaks and cleanup
6 years ago
J. King
84d103269f
30% improvement in performance for multibyte characters
6 years ago
J. King
3aaaae0c74
More performance improvements, and a regression fix
6 years ago
J. King
3cb49bbc77
Further performance improvements
6 years ago
J. King
aa58f619d7
Optimize for ASCII characters in ord()
This yields a 60% performance improvement on a typical HTML document
6 years ago
J. King
434e41cc2c
Initial round of decoding tests, with one fix
6 years ago
J. King
aa0d6ce20e
Split off UTF-8 tools from URL parser
6 years ago
J. King
30162e8525
Correct deficiencies in UTF-8 handling
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:
- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
6 years ago
J. King
fd8c333a68
Split off UTF-8 processing into its own class, greately expanded
Also simplified some parts of the algorithm implementation
Part of this simplification involves the use of goto statements
6 years ago