A set of dependency-free basic internationalization tools
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
J. King 61993bb900 Fix typo... 6 years ago
lib/Encoding Complete battery of tests for gb18030 6 years ago
perf Style fixes 6 years ago
tests Complete battery of tests for gb18030 6 years ago
tools Complete battery of tests for gb18030 6 years ago
vendor-bin Composer tweaks 6 years ago
.gitattributes Initial commit with a few states; not yet tested 6 years ago
.gitignore Add a performance profiling script 6 years ago
.php_cs.dist Style fixes 6 years ago
AUTHORS Split off UTF-8 tools from URL parser 6 years ago
CHANGELOG Fix typo... 6 years ago
LICENSE Split off UTF-8 tools from URL parser 6 years ago
README.md Documentation update 6 years ago
RoboFile.php Make performance comparison fairer 6 years ago
composer.json Composer tweaks 6 years ago
composer.lock Composer tweaks 6 years ago
robo Add infrstructure required for tests 6 years ago
robo.bat Add infrstructure required for tests 6 years ago

README.md

Dependency-free internationalization tools for PHP

While PHP's internationalization extension offers excellent and extensive functionality for dealing with human languages, character encodings, and various related things, it is not always available. Moreover, its character decoder does not yield the same results as WHATWG's Encoding standard, making it unsuitable for implementing parsers for URLs or HTML. The more widely used multi-byte string extension not only suffers the same problems, but is also very slow.

Included here is a partial suite of WHATWG-compatible seekable string decoders which are reasonably performant while requiring no external dependencies or PHP extensions. At present it includes the following encodings:

  • UTF-8
  • gb18030
  • GBK
  • and all single-byte encodings

Where applicable, code point encoders are also included. In time it will be extended to cover the entire suite of WHATWG character encodings, and may also provide other character-centric internationalization functionality.