J. King
7f2c11dcc2
Make the iterator iterate over code points rather than characters
Also fix performance measurement for the iterator; it was all wrong.
6 years ago
J. King
cb1cab9d84
Implement fatal replacement mode
6 years ago
J. King
1cd9c33c85
Composer tweaks
6 years ago
J. King
4ca07befe5
Change API symbols for greater consistency and clarity
6 years ago
J. King
ccf1fe180a
More safely back up state
6 years ago
J. King
8f7a7ed49e
Add basic iterator implementation
6 years ago
J. King
e12fc0d77f
Reorganize namespaces anticipating future internationalization tasks
6 years ago
J. King
51f7bbc5f7
Test encoder
6 years ago
J. King
72291b5f0d
Implement string length
6 years ago
J. King
5c21a3634c
Implement peeking
6 years ago
J. King
41a3a7bb5e
Clean up
6 years ago
J. King
88497ddc41
Remove functional interface
The maintenance burden is not worth the advantages it provides in
limited situations.
Moreover, if other decoders are to be implemented, most multi-byte
schemes would not be able to support a functional interface of similar
simplicity, and single-byte schemes wouldn't benefit much
6 years ago
J. King
7409520477
Make performance comparison fairer
The Intl tests avoided one mission user-function calls by doing a simple
loop, something unlikely to be used in real-world situations; wrapping
the test in a generator adds the overhead one would expect to have,
making the pure PHP implementation much more competitive
6 years ago
J. King
ca91a86744
Clean up static-method interface and test it
6 years ago
J. King
34eee5fcc3
Rename test case file
6 years ago
J. King
b32b1ec038
Style fixes
6 years ago
J. King
6fd50f0681
Ensure char and byte position never goes beyond the end of the string
6 years ago
J. King
9fba89ebda
Tested seeking
6 years ago
J. King
a99702d4ab
More robust self-synchronization
6 years ago
J. King
c11da3ac6b
Remove now unnecessary data generator
6 years ago
J. King
b871c4f2fd
Implement seeking backward though a string
6 years ago
J. King
ac5e91f843
Restore deleted portion of functional interface
Also added comparative performance measurement
6 years ago
J. King
1ed3c36a65
Start on alternate object-based interface
This is both simpler, and slightly faster, yielding between 2% and 5% faster performance
6 years ago
J. King
69a194ecf8
More useful performance test output
7 years ago
J. King
a441fc6a95
CS fix
7 years ago
J. King
3698aa8d8d
Tweaks and cleanup
7 years ago
J. King
84d103269f
30% improvement in performance for multibyte characters
7 years ago
J. King
e755699dd7
Changed performance test data
7 years ago
J. King
3aaaae0c74
More performance improvements, and a regression fix
7 years ago
J. King
3cb49bbc77
Further performance improvements
7 years ago
J. King
6a97da7435
Reduced number of performace tests
7 years ago
J. King
cd68883d07
Add a performance profiling script
7 years ago
J. King
aa58f619d7
Optimize for ASCII characters in ord()
This yields a 60% performance improvement on a typical HTML document
7 years ago
J. King
434e41cc2c
Initial round of decoding tests, with one fix
7 years ago
J. King
b725fddc6c
Clean up Robofile
7 years ago
J. King
9062f4e6a6
Add infrstructure required for tests
7 years ago
J. King
aa0d6ce20e
Split off UTF-8 tools from URL parser
7 years ago
J. King
30162e8525
Correct deficiencies in UTF-8 handling
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:
- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
7 years ago
J. King
7d13a6c3b7
Four more states
7 years ago
J. King
80975d595e
Implement relative state; slight refactor
7 years ago
J. King
fd8c333a68
Split off UTF-8 processing into its own class, greately expanded
Also simplified some parts of the algorithm implementation
Part of this simplification involves the use of goto statements
7 years ago
J. King
42dfd0171f
Process UTF-8 characters rather than single bytes
7 years ago
J. King
23fd5872f6
Minor clarifications
7 years ago
J. King
9786d25aa5
Initial commit with a few states; not yet tested
7 years ago