Commit graph

119 commits

Author SHA1 Message Date
986c709fce Update for PHP 8.4 2024-12-27 07:23:00 -05:00
88dbf8398a Don't use @; fix dynamic properties 2023-01-25 17:12:58 -05:00
07d26e3f45 Add BOM handling
Per specification this does not extend to GB18030
2021-10-24 10:37:46 -04:00
2e2ed16788 Tests for ISO-2022-JP spanning 2021-03-25 15:02:32 -04:00
143590cb53 Hopefully less incorrect spanning for ISO-2022-JP 2021-03-24 18:36:58 -04:00
e5aac0b409 Improved spanning for ISO-2022-JP 2021-03-24 15:20:34 -04:00
d9d92e5e77 Test all spanning other than ISO-2022-JP
ISO-2022-JP will require a more careful implementation to deal with
mode changes to ASCII or Roman mode
2021-03-24 09:36:16 -04:00
81186973f1 Partial tests for ASCII spanning 2021-03-24 08:59:59 -04:00
60a5487e46 Fix spanning with single-byte encodings 2021-03-16 18:58:18 -04:00
cc9c937810 Don't rely on PHP 8 signature changes 2021-03-12 23:01:37 -05:00
bf81571ce4 Prototype strspn equivalent 2021-03-12 18:29:07 -05:00
87ec30a375 Explicit constant visibility
Also partially revert change to encoder determination
2020-10-24 22:53:12 -04:00
600379a4dd Fill out API documentation 2020-10-24 14:24:23 -04:00
c234702cce Speed up encoding; make ISO 2022-JP more consistent
- The ISO 2022-JP encoder is now static as with all others; this is
slightly slower, but localises the encoder logic to its class
- Indexed encoders now cache pointer tables on first use, yielding
significant performance benefits
- Encoding multiple characters now uses fewer function calls, yielding
moderate performance benefits at the expense of slight complication
2020-10-19 23:12:45 -04:00
efdac91b30 Optimize ISO 2022-JP encoder 2020-10-19 19:08:43 -04:00
be2134cc71 API re-organization 2020-10-18 15:32:49 -04:00
464bc4a0a9 Specify PHP 7.1 requirement 2020-10-17 17:55:26 -04:00
cde4100b8a Make correct termination of an ISO 2022-JP output string easier 2020-10-17 14:13:36 -04:00
808b4128dd Tests for replacement encoding; readme correction 2020-10-17 13:52:32 -04:00
ffa3f431d6 Coverage fixes 2020-10-16 20:35:27 -04:00
d580e93e52 ISO 2022-JP encoder tests and fixes 2020-10-16 20:13:53 -04:00
10328b6806 Tests for general encoder 2020-10-15 16:19:57 -04:00
db738bba99 Encoder for x-user-defined 2020-10-15 12:34:03 -04:00
a57dde6dbd Style fixes 2020-10-15 10:39:44 -04:00
16f411c767 Prototype ISO 2022-JP encoder
The encoder currently operates only on single code points, but will later be
expanded to operate on iterables to construct complete strings. For encodings
other than ISO 2022-JP this is merely a convenience, but the algorithm for
that encoding mandates that encoded strings terminate in a switch to ASCII
mode, which a single-character encoder cannot accomplish by itself.
2020-10-14 23:45:32 -04:00
cdd1c0182b Corrected ISO 2022-JP decoder and seeker 2020-10-14 12:29:19 -04:00
9f7e496bf6 Plug potential memory leak 2020-10-13 18:58:53 -04:00
2f3ad29ce6 Prototype ISO 2022-JP decoder 2020-10-10 23:08:51 -04:00
96846d061c Complete Shift_JIS testing 2020-10-08 19:22:32 -04:00
915aa7ca93 Finally fix Shift_JIS seeker 2020-10-07 22:48:50 -04:00
4b2a396c64 Prototype for replacement encoding 2020-10-07 15:53:58 -04:00
ef9932ffcb Correct various ShiftJIS errors 2020-10-07 14:11:19 -04:00
d9b8cd8dd1 Fixes for multi-byte index-base encoders
- array_flip() retains the last duplicate, when we need the first
- Indexes are now prepared with a list of first-duplicate code points
to search before flipping
- This affected only U+3000 in GBK
- Big5 did not use array_flip(), but its list of override code points
did not include U+2561; Big5 now flips like the others
- EUC-JP had a long list of errors, but this encoding was not
previously released
- Shift_JIS' indexes are probably not correct, still
2020-10-07 11:28:21 -04:00
9e812ffdf8 Second stab at Shift_JIS
- Decoder implemented, with correct table
- Modernized decoder; may have bugs
- Backwards seeker hopefully, though it does not yet pass fuzzer
2020-10-06 16:12:57 -04:00
b284056644 Encode correct duplicate pointers in EUC-JP 2020-10-06 15:39:33 -04:00
46b6ac3c44 Complete and correct EUC-JP implementation 2020-10-06 11:47:22 -04:00
0682e294c8 Add new labels 2020-10-06 11:47:22 -04:00
f7246ccc34 Fix gb18030 seeking; tidy up 2020-10-06 11:42:32 -04:00
0eb2a8ac24 Fix bugs in gb18030 and UTF-16
- UTF-16 needs to restore dirtyEOF after seeking
- gb18030 now tracks errors like other non-synchronizing encodings
- gb18030 could produce null when asked for a character
2020-10-06 11:42:32 -04:00
a12a2a0413 Simplify EUC-KR seeking
This is in line with Big5 logic
2020-10-06 11:42:32 -04:00
be034a08e0 Move dirty EOF handling to UTF-16
It remains useful for this encoding, which is other self-synchronizing
2020-10-06 11:42:32 -04:00
1f007b88f1 Fix UTF-8 seeking through truncated sequences 2020-10-06 11:42:32 -04:00
220cbce9a0 Address performance regression in peeking 2020-10-06 11:42:32 -04:00
9f08fb7424 Fix backwards seeking for Big5
Other non-synchronizing encodings will also need fixing
2020-10-06 11:42:32 -04:00
6417e8f0be Start overhauling error handling; adjust coverage annotations 2020-10-06 11:42:32 -04:00
e06096c624 Ensure seekBack is defined 2020-10-06 11:42:32 -04:00
61a77086bb Make GenericEncoding trait an abstract class 2020-10-06 11:42:32 -04:00
235fdc4103 Note self-synchronizing encodings for later 2020-10-06 11:42:32 -04:00
a3c16252b8 Correct documentation of StatefulEncoding 2020-10-06 11:42:32 -04:00
f69cd98b4c Make posErr fully generic 2020-10-06 11:42:32 -04:00