A set of dependency-free basic internationalization tools
Find a file
2020-10-15 16:19:57 -04:00
lib Tests for general encoder 2020-10-15 16:19:57 -04:00
perf Style fixes 2018-08-10 15:00:30 -04:00
tests Tests for general encoder 2020-10-15 16:19:57 -04:00
tools Style fixes 2020-10-15 10:39:44 -04:00
vendor-bin Upgrade to PHPUnit 8 2019-12-13 11:05:01 -05:00
.gitattributes Initial commit with a few states; not yet tested 2018-04-08 20:10:17 -04:00
.gitignore Upgrade to PHPUnit 8 2019-12-13 11:05:01 -05:00
.php_cs.dist Apply stricter house style where possible 2020-10-06 11:42:30 -04:00
AUTHORS Split off UTF-8 tools from URL parser 2018-04-23 11:04:40 -04:00
CHANGELOG Pre-emptively update changelog 2020-10-15 10:37:11 -04:00
composer.json Refactor tests 2018-08-29 23:32:36 -04:00
composer.lock Tooling update 2019-12-13 09:22:12 -05:00
LICENSE Split off UTF-8 tools from URL parser 2018-04-23 11:04:40 -04:00
README.md Full tests for EUC-KR 2018-09-15 19:46:42 -04:00
robo Tooling update 2019-12-13 09:22:12 -05:00
robo.bat Add infrstructure required for tests 2018-04-23 14:21:25 -04:00
RoboFile.php Cleanup 2020-10-06 11:47:20 -04:00

Dependency-free internationalization tools for PHP

While PHP's internationalization extension offers excellent and extensive functionality for dealing with human languages, character encodings, and various related things, it is not always available. Moreover, its character decoder does not yield the same results as WHATWG's Encoding standard, making it unsuitable for implementing parsers for URLs or HTML. The more widely used multi-byte string extension not only suffers the same problems, but is also very slow.

Included here is a partial suite of WHATWG-compatible seekable string decoders which are reasonably performant while requiring no external dependencies or PHP extensions. At present it includes the following encodings:

  • UTF-8
  • UTF-16
  • gb18030
  • GBK
  • Big5
  • EUC-KR
  • all single-byte encodings
  • x-user-defined

Where applicable, code point encoders are also included. In time it will be extended to cover the entire suite of WHATWG character encodings, and may also provide other character-centric internationalization functionality.