986c709fce
Update for PHP 8.4
2024-12-27 07:23:00 -05:00
88dbf8398a
Don't use @; fix dynamic properties
2023-01-25 17:12:58 -05:00
07d26e3f45
Add BOM handling
...
Per specification this does not extend to GB18030
2021-10-24 10:37:46 -04:00
2e2ed16788
Tests for ISO-2022-JP spanning
2021-03-25 15:02:32 -04:00
143590cb53
Hopefully less incorrect spanning for ISO-2022-JP
2021-03-24 18:36:58 -04:00
e5aac0b409
Improved spanning for ISO-2022-JP
2021-03-24 15:20:34 -04:00
d9d92e5e77
Test all spanning other than ISO-2022-JP
...
ISO-2022-JP will require a more careful implementation to deal with
mode changes to ASCII or Roman mode
2021-03-24 09:36:16 -04:00
81186973f1
Partial tests for ASCII spanning
2021-03-24 08:59:59 -04:00
60a5487e46
Fix spanning with single-byte encodings
2021-03-16 18:58:18 -04:00
cc9c937810
Don't rely on PHP 8 signature changes
2021-03-12 23:01:37 -05:00
bf81571ce4
Prototype strspn equivalent
2021-03-12 18:29:07 -05:00
87ec30a375
Explicit constant visibility
...
Also partially revert change to encoder determination
2020-10-24 22:53:12 -04:00
600379a4dd
Fill out API documentation
2020-10-24 14:24:23 -04:00
c234702cce
Speed up encoding; make ISO 2022-JP more consistent
...
- The ISO 2022-JP encoder is now static as with all others; this is
slightly slower, but localises the encoder logic to its class
- Indexed encoders now cache pointer tables on first use, yielding
significant performance benefits
- Encoding multiple characters now uses fewer function calls, yielding
moderate performance benefits at the expense of slight complication
2020-10-19 23:12:45 -04:00
efdac91b30
Optimize ISO 2022-JP encoder
2020-10-19 19:08:43 -04:00
be2134cc71
API re-organization
2020-10-18 15:32:49 -04:00
464bc4a0a9
Specify PHP 7.1 requirement
2020-10-17 17:55:26 -04:00
cde4100b8a
Make correct termination of an ISO 2022-JP output string easier
2020-10-17 14:13:36 -04:00
808b4128dd
Tests for replacement encoding; readme correction
2020-10-17 13:52:32 -04:00
ffa3f431d6
Coverage fixes
2020-10-16 20:35:27 -04:00
d580e93e52
ISO 2022-JP encoder tests and fixes
2020-10-16 20:13:53 -04:00
10328b6806
Tests for general encoder
2020-10-15 16:19:57 -04:00
db738bba99
Encoder for x-user-defined
2020-10-15 12:34:03 -04:00
a57dde6dbd
Style fixes
2020-10-15 10:39:44 -04:00
16f411c767
Prototype ISO 2022-JP encoder
...
The encoder currently operates only on single code points, but will later be
expanded to operate on iterables to construct complete strings. For encodings
other than ISO 2022-JP this is merely a convenience, but the algorithm for
that encoding mandates that encoded strings terminate in a switch to ASCII
mode, which a single-character encoder cannot accomplish by itself.
2020-10-14 23:45:32 -04:00
cdd1c0182b
Corrected ISO 2022-JP decoder and seeker
2020-10-14 12:29:19 -04:00
9f7e496bf6
Plug potential memory leak
2020-10-13 18:58:53 -04:00
2f3ad29ce6
Prototype ISO 2022-JP decoder
2020-10-10 23:08:51 -04:00
96846d061c
Complete Shift_JIS testing
2020-10-08 19:22:32 -04:00
915aa7ca93
Finally fix Shift_JIS seeker
2020-10-07 22:48:50 -04:00
4b2a396c64
Prototype for replacement encoding
2020-10-07 15:53:58 -04:00
ef9932ffcb
Correct various ShiftJIS errors
2020-10-07 14:11:19 -04:00
d9b8cd8dd1
Fixes for multi-byte index-base encoders
...
- array_flip() retains the last duplicate, when we need the first
- Indexes are now prepared with a list of first-duplicate code points
to search before flipping
- This affected only U+3000 in GBK
- Big5 did not use array_flip(), but its list of override code points
did not include U+2561; Big5 now flips like the others
- EUC-JP had a long list of errors, but this encoding was not
previously released
- Shift_JIS' indexes are probably not correct, still
2020-10-07 11:28:21 -04:00
9e812ffdf8
Second stab at Shift_JIS
...
- Decoder implemented, with correct table
- Modernized decoder; may have bugs
- Backwards seeker hopefully, though it does not yet pass fuzzer
2020-10-06 16:12:57 -04:00
b284056644
Encode correct duplicate pointers in EUC-JP
2020-10-06 15:39:33 -04:00
46b6ac3c44
Complete and correct EUC-JP implementation
2020-10-06 11:47:22 -04:00
0682e294c8
Add new labels
2020-10-06 11:47:22 -04:00
f7246ccc34
Fix gb18030 seeking; tidy up
2020-10-06 11:42:32 -04:00
0eb2a8ac24
Fix bugs in gb18030 and UTF-16
...
- UTF-16 needs to restore dirtyEOF after seeking
- gb18030 now tracks errors like other non-synchronizing encodings
- gb18030 could produce null when asked for a character
2020-10-06 11:42:32 -04:00
a12a2a0413
Simplify EUC-KR seeking
...
This is in line with Big5 logic
2020-10-06 11:42:32 -04:00
be034a08e0
Move dirty EOF handling to UTF-16
...
It remains useful for this encoding, which is other self-synchronizing
2020-10-06 11:42:32 -04:00
1f007b88f1
Fix UTF-8 seeking through truncated sequences
2020-10-06 11:42:32 -04:00
220cbce9a0
Address performance regression in peeking
2020-10-06 11:42:32 -04:00
9f08fb7424
Fix backwards seeking for Big5
...
Other non-synchronizing encodings will also need fixing
2020-10-06 11:42:32 -04:00
6417e8f0be
Start overhauling error handling; adjust coverage annotations
2020-10-06 11:42:32 -04:00
e06096c624
Ensure seekBack is defined
2020-10-06 11:42:32 -04:00
61a77086bb
Make GenericEncoding trait an abstract class
2020-10-06 11:42:32 -04:00
235fdc4103
Note self-synchronizing encodings for later
2020-10-06 11:42:32 -04:00
a3c16252b8
Correct documentation of StatefulEncoding
2020-10-06 11:42:32 -04:00
f69cd98b4c
Make posErr fully generic
2020-10-06 11:42:32 -04:00