A set of dependency-free basic internationalization tools
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
J. King 58328b7524 Changelog for 0.4.0 6 years ago
lib/Encoding Full tests for EUC-KR 6 years ago
perf Style fixes 6 years ago
tests Full tests for EUC-KR 6 years ago
tools Full tests for EUC-KR 6 years ago
vendor-bin Tidying 6 years ago
.gitattributes Initial commit with a few states; not yet tested 6 years ago
.gitignore Add a performance profiling script 6 years ago
.php_cs.dist Style fixes 6 years ago
AUTHORS Split off UTF-8 tools from URL parser 6 years ago
CHANGELOG Changelog for 0.4.0 6 years ago
LICENSE Split off UTF-8 tools from URL parser 6 years ago
README.md Full tests for EUC-KR 6 years ago
RoboFile.php Make performance comparison fairer 6 years ago
composer.json Refactor tests 6 years ago
composer.lock Composer tweaks 6 years ago
robo Add infrstructure required for tests 6 years ago
robo.bat Add infrstructure required for tests 6 years ago

README.md

Dependency-free internationalization tools for PHP

While PHP's internationalization extension offers excellent and extensive functionality for dealing with human languages, character encodings, and various related things, it is not always available. Moreover, its character decoder does not yield the same results as WHATWG's Encoding standard, making it unsuitable for implementing parsers for URLs or HTML. The more widely used multi-byte string extension not only suffers the same problems, but is also very slow.

Included here is a partial suite of WHATWG-compatible seekable string decoders which are reasonably performant while requiring no external dependencies or PHP extensions. At present it includes the following encodings:

  • UTF-8
  • UTF-16
  • gb18030
  • GBK
  • Big5
  • EUC-KR
  • all single-byte encodings
  • x-user-defined

Where applicable, code point encoders are also included. In time it will be extended to cover the entire suite of WHATWG character encodings, and may also provide other character-centric internationalization functionality.