J. King
fdbeecdb17
Add name and label to x-user-defined
6 years ago
J. King
d5327a3b83
Implement x-user-defined decoder
Also further refactored tests to better account for one-way encodings
6 years ago
J. King
dd9bed2e84
Implement UTF-16
6 years ago
J. King
a0bf8a9b05
Don't check for dirty EOF on every iteration
6 years ago
J. King
e683167905
Style fixes
Because of the large arrays in the GBCommon class and its test suite,
memory limits had to be disabled in php-cs-fixer
6 years ago
J. King
1449fae908
Refactor UTF-8 seeking
6 years ago
J. King
4c686aa8a1
Complete battery of tests for gb18030
6 years ago
J. King
1b9889914a
Fix numerous bugs with gb18030
6 years ago
J. King
467c565e8c
Implement gb18030 seeking
Also fix some bugs in EOF handling
6 years ago
J. King
40d0054bd1
Implement gb18030 and GBK encoders
6 years ago
J. King
766643aa37
Common infrstructure for gb18030 and GBK
6 years ago
J. King
d6747532cd
Implement gb18030 decoder
6 years ago
J. King
3a19b93aab
Move nextChar to generic class
6 years ago
J. King
3ee653307c
Implement all other single-byte encodings
6 years ago
J. King
269ecf4a96
Style fixes
6 years ago
J. King
7de6d7a6fc
Implement ISO-8859-6 single-byte encoding
Other single-byte encodings to follow
6 years ago
J. King
8c97b42303
Define interfaces for encodings
6 years ago
J. King
d8af9600ee
Clarified docstrings
6 years ago
J. King
540d8a237e
Style fixes
6 years ago
J. King
3920f11e22
Clean up
6 years ago
J. King
e2c4136001
Change iterator to a set of generators
Not only is the faster than a classical iterator (though still not as
fast as a while loop), but it also offers the choice of characters
or code points.
6 years ago
J. King
7f2c11dcc2
Make the iterator iterate over code points rather than characters
Also fix performance measurement for the iterator; it was all wrong.
6 years ago
J. King
cb1cab9d84
Implement fatal replacement mode
6 years ago
J. King
4ca07befe5
Change API symbols for greater consistency and clarity
6 years ago
J. King
ccf1fe180a
More safely back up state
6 years ago
J. King
8f7a7ed49e
Add basic iterator implementation
6 years ago
J. King
e12fc0d77f
Reorganize namespaces anticipating future internationalization tasks
6 years ago
J. King
72291b5f0d
Implement string length
6 years ago
J. King
5c21a3634c
Implement peeking
6 years ago
J. King
41a3a7bb5e
Clean up
6 years ago
J. King
88497ddc41
Remove functional interface
The maintenance burden is not worth the advantages it provides in
limited situations.
Moreover, if other decoders are to be implemented, most multi-byte
schemes would not be able to support a functional interface of similar
simplicity, and single-byte schemes wouldn't benefit much
6 years ago
J. King
ca91a86744
Clean up static-method interface and test it
6 years ago
J. King
b32b1ec038
Style fixes
6 years ago
J. King
6fd50f0681
Ensure char and byte position never goes beyond the end of the string
6 years ago
J. King
9fba89ebda
Tested seeking
6 years ago
J. King
a99702d4ab
More robust self-synchronization
6 years ago
J. King
b871c4f2fd
Implement seeking backward though a string
6 years ago
J. King
ac5e91f843
Restore deleted portion of functional interface
Also added comparative performance measurement
6 years ago
J. King
1ed3c36a65
Start on alternate object-based interface
This is both simpler, and slightly faster, yielding between 2% and 5% faster performance
6 years ago
J. King
a441fc6a95
CS fix
7 years ago
J. King
3698aa8d8d
Tweaks and cleanup
7 years ago
J. King
84d103269f
30% improvement in performance for multibyte characters
7 years ago
J. King
3aaaae0c74
More performance improvements, and a regression fix
7 years ago
J. King
3cb49bbc77
Further performance improvements
7 years ago
J. King
aa58f619d7
Optimize for ASCII characters in ord()
This yields a 60% performance improvement on a typical HTML document
7 years ago
J. King
434e41cc2c
Initial round of decoding tests, with one fix
7 years ago
J. King
aa0d6ce20e
Split off UTF-8 tools from URL parser
7 years ago
J. King
30162e8525
Correct deficiencies in UTF-8 handling
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:
- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
7 years ago
J. King
7d13a6c3b7
Four more states
7 years ago
J. King
80975d595e
Implement relative state; slight refactor
7 years ago