Modern DOM library written in PHP for HTML documents
Find a file
2021-03-17 17:21:10 -04:00
lib Add missing tests for charset pre-scan 2021-03-17 15:35:44 -04:00
tests Typo 2021-03-17 17:21:10 -04:00
vendor-bin Minor cleanup 2021-03-17 14:04:14 -04:00
.gitattributes Add missing tests for charset pre-scan 2021-03-17 15:35:44 -04:00
.gitignore Added innerHTML to Element, getting of outerHTML, started on setting 2021-03-16 14:37:20 -05:00
AUTHORS Added authors file and updated license 2018-08-03 23:21:15 -05:00
composer.json Optimize character consumption 2021-03-12 23:47:47 -05:00
composer.lock Use fixed intl 2021-03-16 19:04:12 -04:00
LICENSE Added authors file and updated license 2018-08-03 23:21:15 -05:00
README.md More comparison data 2021-03-16 22:50:31 -04:00
robo Basic skeleton of test suite 2019-12-10 18:00:08 -05:00
robo.bat Basic skeleton of test suite 2019-12-10 18:00:08 -05:00
RoboFile.php Start on infoset coercison 2021-03-07 12:20:46 -05:00

HTML5

Tools for parsing and printing HTML5 documents and fragments.

<?php
$dom = dW\HTML5\Parser::parse('<!DOCTYPE html><html lang="en" charset="utf-8"><head><title>Ook!</title></head><body><h1>Ook!</h1><p>Ook-ook? Oooook. Ook ook oook ook oooooook ook ooook ook.</p><p>Eek!</p></body></html>');
?>

or:

<?php
$dom = new dW\HTML5\Document;
$dom->loadHTML('<!DOCTYPE html><html lang="en" charset="utf-8"><head><title>Ook!</title></head><body><h1>Ook!</h1><p>Ook-ook? Oooook. Ook ook oook ook oooooook ook ooook ook.</p><p>Eek!</p></body></html>');
?>

Comparison with masterminds/html5

This library and masterminds/html5 serve similar purposes. Generally, we are more accurate, but they are much faster. The following table summarizes the main functional differences.

Masterminds MensBeam
Minimum PHP version 5.3 7.1
Extensions required dom, ctype, mbstring or iconv dom
Supported encodings System-dependent Per specification
Encoding detection None Byte order mark, HTTP header, pre-scan
Fallback encoding UTF-8, configurable Windows-1252, configurable
Handling of invalid characters Characters are dropped Per specification
Handling of invalid XML element names Name is changed to "invalid" Per specification
Handling of invalid XML attribute names Attribute is dropped Per specification
Handling of misnested tags Parent end tags always close children Per specification
Handling of data between table cells Left as-is Per specification
Handling of omitted start tags Elements are not inserted Per specification
Handling of processing instructions Processing instructions are retained Per specification
Namespace for HTML elements Per specification, configurable Null
Time needed to parse single-page HTML specification 2.8 seconds† 7.0 seconds††
Peak memory needed for same 38 MB 13.9 MB

† With HTML namespace disabled. With HTML namespace enabled it does not finish in a reasonable time due to a PHP bug.

†† With parse errors suppressed. Reporting parse errors adds approximately 10% overhead