Commit graph

178 commits

Author SHA1 Message Date
143590cb53 Hopefully less incorrect spanning for ISO-2022-JP 2021-03-24 18:36:58 -04:00
e5aac0b409 Improved spanning for ISO-2022-JP 2021-03-24 15:20:34 -04:00
d9d92e5e77 Test all spanning other than ISO-2022-JP
ISO-2022-JP will require a more careful implementation to deal with
mode changes to ASCII or Roman mode
2021-03-24 09:36:16 -04:00
81186973f1 Partial tests for ASCII spanning 2021-03-24 08:59:59 -04:00
c64a43992b Prototype span test 2021-03-23 19:03:16 -04:00
60a5487e46 Fix spanning with single-byte encodings 2021-03-16 18:58:18 -04:00
cc9c937810 Don't rely on PHP 8 signature changes 2021-03-12 23:01:37 -05:00
bf81571ce4 Prototype strspn equivalent 2021-03-12 18:29:07 -05:00
7327e55a50 Update tooling 2021-03-12 18:23:57 -05:00
95d573c014 Update changeloog 2021-03-06 22:55:19 -05:00
2029cd2820 Validate for PHP 8 2021-03-06 22:53:04 -05:00
5c8116afb8 Prepare release 2020-10-27 19:17:36 -04:00
4539e56e87 Merge branch 'multi-byte' into master 2020-10-25 10:56:38 -04:00
87ec30a375 Explicit constant visibility
Also partially revert change to encoder determination
2020-10-24 22:53:12 -04:00
600379a4dd Fill out API documentation 2020-10-24 14:24:23 -04:00
c234702cce Speed up encoding; make ISO 2022-JP more consistent
- The ISO 2022-JP encoder is now static as with all others; this is
slightly slower, but localises the encoder logic to its class
- Indexed encoders now cache pointer tables on first use, yielding
significant performance benefits
- Encoding multiple characters now uses fewer function calls, yielding
moderate performance benefits at the expense of slight complication
2020-10-19 23:12:45 -04:00
efdac91b30 Optimize ISO 2022-JP encoder 2020-10-19 19:08:43 -04:00
be2134cc71 API re-organization 2020-10-18 15:32:49 -04:00
464bc4a0a9 Specify PHP 7.1 requirement 2020-10-17 17:55:26 -04:00
cde4100b8a Make correct termination of an ISO 2022-JP output string easier 2020-10-17 14:13:36 -04:00
808b4128dd Tests for replacement encoding; readme correction 2020-10-17 13:52:32 -04:00
ffa3f431d6 Coverage fixes 2020-10-16 20:35:27 -04:00
d580e93e52 ISO 2022-JP encoder tests and fixes 2020-10-16 20:13:53 -04:00
10328b6806 Tests for general encoder 2020-10-15 16:19:57 -04:00
db738bba99 Encoder for x-user-defined 2020-10-15 12:34:03 -04:00
a57dde6dbd Style fixes 2020-10-15 10:39:44 -04:00
4299bf0100 Pre-emptively update changelog 2020-10-15 10:37:11 -04:00
16f411c767 Prototype ISO 2022-JP encoder
The encoder currently operates only on single code points, but will later be
expanded to operate on iterables to construct complete strings. For encodings
other than ISO 2022-JP this is merely a convenience, but the algorithm for
that encoding mandates that encoded strings terminate in a switch to ASCII
mode, which a single-character encoder cannot accomplish by itself.
2020-10-14 23:45:32 -04:00
cdd1c0182b Corrected ISO 2022-JP decoder and seeker 2020-10-14 12:29:19 -04:00
9f7e496bf6 Plug potential memory leak 2020-10-13 18:58:53 -04:00
86c2b0d628 Fix coverage 2020-10-11 18:36:49 -04:00
2f3ad29ce6 Prototype ISO 2022-JP decoder 2020-10-10 23:08:51 -04:00
53b27d1a55 Correct buggy Shift_JIS tests 2020-10-09 09:47:28 -04:00
96846d061c Complete Shift_JIS testing 2020-10-08 19:22:32 -04:00
d45e0be7c3 Typo 2020-10-08 17:18:48 -04:00
915aa7ca93 Finally fix Shift_JIS seeker 2020-10-07 22:48:50 -04:00
4b2a396c64 Prototype for replacement encoding 2020-10-07 15:53:58 -04:00
ef9932ffcb Correct various ShiftJIS errors 2020-10-07 14:11:19 -04:00
d9b8cd8dd1 Fixes for multi-byte index-base encoders
- array_flip() retains the last duplicate, when we need the first
- Indexes are now prepared with a list of first-duplicate code points
to search before flipping
- This affected only U+3000 in GBK
- Big5 did not use array_flip(), but its list of override code points
did not include U+2561; Big5 now flips like the others
- EUC-JP had a long list of errors, but this encoding was not
previously released
- Shift_JIS' indexes are probably not correct, still
2020-10-07 11:28:21 -04:00
9e812ffdf8 Second stab at Shift_JIS
- Decoder implemented, with correct table
- Modernized decoder; may have bugs
- Backwards seeker hopefully, though it does not yet pass fuzzer
2020-10-06 16:12:57 -04:00
b284056644 Encode correct duplicate pointers in EUC-JP 2020-10-06 15:39:33 -04:00
46b6ac3c44 Complete and correct EUC-JP implementation 2020-10-06 11:47:22 -04:00
0682e294c8 Add new labels 2020-10-06 11:47:22 -04:00
7803b8af9e Cleanup 2020-10-06 11:47:20 -04:00
1200891feb Update changelog 2020-10-06 11:42:32 -04:00
f7246ccc34 Fix gb18030 seeking; tidy up 2020-10-06 11:42:32 -04:00
14d67ad49f Add fuzz test for backwards seeking
Test data is 1025 random bytes; gb18030 still fails
2020-10-06 11:42:32 -04:00
0eb2a8ac24 Fix bugs in gb18030 and UTF-16
- UTF-16 needs to restore dirtyEOF after seeking
- gb18030 now tracks errors like other non-synchronizing encodings
- gb18030 could produce null when asked for a character
2020-10-06 11:42:32 -04:00
a12a2a0413 Simplify EUC-KR seeking
This is in line with Big5 logic
2020-10-06 11:42:32 -04:00
be034a08e0 Move dirty EOF handling to UTF-16
It remains useful for this encoding, which is other self-synchronizing
2020-10-06 11:42:32 -04:00