Commit graph

7 commits

Author SHA1 Message Date
30162e8525 Correct deficiencies in UTF-8 handling
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:

- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
2018-04-22 22:35:03 -04:00
7d13a6c3b7 Four more states 2018-04-18 11:45:09 -04:00
80975d595e Implement relative state; slight refactor 2018-04-11 15:29:47 -04:00
fd8c333a68 Split off UTF-8 processing into its own class, greately expanded
Also simplified some parts of the algorithm implementation

Part of this simplification involves the use of goto statements
2018-04-10 17:58:09 -04:00
42dfd0171f Process UTF-8 characters rather than single bytes 2018-04-09 15:43:28 -04:00
23fd5872f6 Minor clarifications 2018-04-08 20:19:59 -04:00
9786d25aa5 Initial commit with a few states; not yet tested 2018-04-08 20:10:17 -04:00