7f2c11dcc2
Make the iterator iterate over code points rather than characters
...
Also fix performance measurement for the iterator; it was all wrong.
2018-08-10 10:39:09 -04:00
cb1cab9d84
Implement fatal replacement mode
2018-08-10 09:16:37 -04:00
1cd9c33c85
Composer tweaks
2018-08-10 08:24:03 -04:00
4ca07befe5
Change API symbols for greater consistency and clarity
2018-08-09 13:29:28 -04:00
ccf1fe180a
More safely back up state
2018-08-03 13:58:00 -04:00
8f7a7ed49e
Add basic iterator implementation
2018-08-03 11:50:57 -04:00
e12fc0d77f
Reorganize namespaces anticipating future internationalization tasks
2018-08-02 16:17:34 -04:00
51f7bbc5f7
Test encoder
2018-08-02 16:06:11 -04:00
72291b5f0d
Implement string length
2018-08-02 15:53:49 -04:00
5c21a3634c
Implement peeking
2018-08-02 15:17:17 -04:00
41a3a7bb5e
Clean up
2018-08-02 14:46:23 -04:00
88497ddc41
Remove functional interface
...
The maintenance burden is not worth the advantages it provides in
limited situations.
Moreover, if other decoders are to be implemented, most multi-byte
schemes would not be able to support a functional interface of similar
simplicity, and single-byte schemes wouldn't benefit much
2018-08-02 14:19:03 -04:00
7409520477
Make performance comparison fairer
...
The Intl tests avoided one mission user-function calls by doing a simple
loop, something unlikely to be used in real-world situations; wrapping
the test in a generator adds the overhead one would expect to have,
making the pure PHP implementation much more competitive
2018-08-01 14:51:05 -04:00
ca91a86744
Clean up static-method interface and test it
2018-07-28 22:34:32 -04:00
34eee5fcc3
Rename test case file
2018-07-27 22:19:53 -04:00
b32b1ec038
Style fixes
2018-07-27 22:11:53 -04:00
6fd50f0681
Ensure char and byte position never goes beyond the end of the string
2018-07-27 19:11:33 -04:00
9fba89ebda
Tested seeking
2018-07-27 18:57:53 -04:00
a99702d4ab
More robust self-synchronization
2018-07-24 08:45:42 -04:00
c11da3ac6b
Remove now unnecessary data generator
2018-07-22 12:17:44 -04:00
b871c4f2fd
Implement seeking backward though a string
2018-07-21 19:59:56 -04:00
ac5e91f843
Restore deleted portion of functional interface
...
Also added comparative performance measurement
2018-07-06 07:35:50 -04:00
1ed3c36a65
Start on alternate object-based interface
...
This is both simpler, and slightly faster, yielding between 2% and 5% faster performance
2018-05-05 14:32:03 -04:00
69a194ecf8
More useful performance test output
2018-04-25 22:58:32 -04:00
a441fc6a95
CS fix
2018-04-25 15:00:05 -04:00
3698aa8d8d
Tweaks and cleanup
2018-04-25 14:54:44 -04:00
84d103269f
30% improvement in performance for multibyte characters
2018-04-25 13:58:52 -04:00
e755699dd7
Changed performance test data
2018-04-25 13:52:45 -04:00
3aaaae0c74
More performance improvements, and a regression fix
2018-04-25 09:51:19 -04:00
3cb49bbc77
Further performance improvements
2018-04-25 00:54:44 -04:00
6a97da7435
Reduced number of performace tests
2018-04-25 00:54:20 -04:00
cd68883d07
Add a performance profiling script
2018-04-25 00:42:29 -04:00
aa58f619d7
Optimize for ASCII characters in ord()
...
This yields a 60% performance improvement on a typical HTML document
2018-04-24 20:46:17 -04:00
434e41cc2c
Initial round of decoding tests, with one fix
2018-04-24 16:26:29 -04:00
b725fddc6c
Clean up Robofile
2018-04-24 14:54:50 -04:00
9062f4e6a6
Add infrstructure required for tests
2018-04-23 14:21:25 -04:00
aa0d6ce20e
Split off UTF-8 tools from URL parser
2018-04-23 11:04:40 -04:00
30162e8525
Correct deficiencies in UTF-8 handling
...
Function now operates as defined by the WHATWG encoding standard; the practical implications of this are that:
- More invalid sequences are correctly identified as invalid
- Overlong encodings are normalized
- ord() and chr() functions have been added as a consequence of this work
2018-04-22 22:35:03 -04:00
7d13a6c3b7
Four more states
2018-04-18 11:45:09 -04:00
80975d595e
Implement relative state; slight refactor
2018-04-11 15:29:47 -04:00
fd8c333a68
Split off UTF-8 processing into its own class, greately expanded
...
Also simplified some parts of the algorithm implementation
Part of this simplification involves the use of goto statements
2018-04-10 17:58:09 -04:00
42dfd0171f
Process UTF-8 characters rather than single bytes
2018-04-09 15:43:28 -04:00
23fd5872f6
Minor clarifications
2018-04-08 20:19:59 -04:00
9786d25aa5
Initial commit with a few states; not yet tested
2018-04-08 20:10:17 -04:00