J. King
cb1cab9d84
Implement fatal replacement mode
6 years ago
J. King
4ca07befe5
Change API symbols for greater consistency and clarity
6 years ago
J. King
ccf1fe180a
More safely back up state
6 years ago
J. King
8f7a7ed49e
Add basic iterator implementation
6 years ago
J. King
e12fc0d77f
Reorganize namespaces anticipating future internationalization tasks
6 years ago
J. King
72291b5f0d
Implement string length
6 years ago
J. King
5c21a3634c
Implement peeking
6 years ago
J. King
41a3a7bb5e
Clean up
6 years ago
J. King
88497ddc41
Remove functional interface
The maintenance burden is not worth the advantages it provides in
limited situations.
Moreover, if other decoders are to be implemented, most multi-byte
schemes would not be able to support a functional interface of similar
simplicity, and single-byte schemes wouldn't benefit much
6 years ago
J. King
ca91a86744
Clean up static-method interface and test it
6 years ago
J. King
b32b1ec038
Style fixes
6 years ago
J. King
6fd50f0681
Ensure char and byte position never goes beyond the end of the string
6 years ago
J. King
9fba89ebda
Tested seeking
6 years ago
J. King
a99702d4ab
More robust self-synchronization
6 years ago
J. King
b871c4f2fd
Implement seeking backward though a string
6 years ago
J. King
ac5e91f843
Restore deleted portion of functional interface
Also added comparative performance measurement
6 years ago
J. King
1ed3c36a65
Start on alternate object-based interface
This is both simpler, and slightly faster, yielding between 2% and 5% faster performance
6 years ago
J. King
a441fc6a95
CS fix
6 years ago
J. King
3698aa8d8d
Tweaks and cleanup
6 years ago
J. King
84d103269f
30% improvement in performance for multibyte characters
6 years ago
J. King
3aaaae0c74
More performance improvements, and a regression fix
6 years ago
J. King
3cb49bbc77
Further performance improvements
6 years ago
J. King
aa58f619d7
Optimize for ASCII characters in ord()
This yields a 60% performance improvement on a typical HTML document
6 years ago
J. King
434e41cc2c
Initial round of decoding tests, with one fix
6 years ago
J. King
aa0d6ce20e
Split off UTF-8 tools from URL parser
6 years ago
J. King
30162e8525
Correct deficiencies in UTF-8 handling
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:
- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
6 years ago
J. King
7d13a6c3b7
Four more states
6 years ago
J. King
80975d595e
Implement relative state; slight refactor
6 years ago
J. King
fd8c333a68
Split off UTF-8 processing into its own class, greately expanded
Also simplified some parts of the algorithm implementation
Part of this simplification involves the use of goto statements
6 years ago
J. King
42dfd0171f
Process UTF-8 characters rather than single bytes
6 years ago
J. King
23fd5872f6
Minor clarifications
6 years ago
J. King
9786d25aa5
Initial commit with a few states; not yet tested
6 years ago