Modern DOM library written in PHP for HTML documents
Find a file
2021-09-19 23:21:36 -05:00
docs More docs theming 2021-04-26 16:07:13 -05:00
lib Starting to move all DOM actual serialization to Document::serialize 2021-09-19 23:21:36 -05:00
tests Tests for inner/outerHTML getting 2021-04-19 18:40:00 -04:00
vendor-bin Added beginnings of documentation 2021-04-05 12:43:11 -05:00
.gitattributes Add missing tests for charset pre-scan 2021-03-17 15:35:44 -04:00
.gitignore Started theming the docs 2021-04-23 16:47:41 -05:00
AUTHORS Added authors file and updated license 2018-08-03 23:21:15 -05:00
composer.json Added beginnings of documentation 2021-04-05 12:43:11 -05:00
composer.lock Added beginnings of documentation 2021-04-05 12:43:11 -05:00
LICENSE Added authors file and updated license 2018-08-03 23:21:15 -05:00
package.json Started theming the docs 2021-04-23 16:47:41 -05:00
postcss.config.js Started theming the docs 2021-04-23 16:47:41 -05:00
README.md Changed ElementRegistry to ElementMap, destructors for ElementMap 2021-04-07 23:35:16 -05:00
robo Basic skeleton of test suite 2019-12-10 18:00:08 -05:00
robo.bat Basic skeleton of test suite 2019-12-10 18:00:08 -05:00
RoboFile.php Started theming the docs 2021-04-23 16:47:41 -05:00
yarn.lock Started theming the docs 2021-04-23 16:47:41 -05:00

HTML

Tools for parsing and printing HTML5 documents and fragments.

<?php
$dom = MensBeam\HTML\Parser::parse('<!DOCTYPE html><html lang="en" charset="utf-8"><head><title>Ook!</title></head><body><h1>Ook!</h1><p>Ook-ook? Oooook. Ook ook oook ook oooooook ook ooook ook.</p><p>Eek!</p></body></html>');
?>

or:

<?php
$dom = new MensBeam\HTML\Document;
$dom->loadHTML('<!DOCTYPE html><html lang="en" charset="utf-8"><head><title>Ook!</title></head><body><h1>Ook!</h1><p>Ook-ook? Oooook. Ook ook oook ook oooooook ook ooook ook.</p><p>Eek!</p></body></html>');
?>

Comparison with masterminds/html5

This library and masterminds/html5 serve similar purposes. Generally, we are more accurate, but they are much faster. The following table summarizes the main functional differences.

DOMDocument Masterminds MensBeam
Minimum PHP version 5.0 5.3 7.1
Extensions required dom dom, ctype, mbstring or iconv dom
Target HTML version HTML 4.01 HTML 5.0 WHATWG Living Standard
Supported encodings System-dependent System-dependent Per specification
Encoding detection BOM, http-equiv None Per specification (Steps 1-5 & 9)
Fallback encoding ISO 8859-1 UTF-8, configurable Windows-1252, configurable
Handling of invalid characters Bytes are passed through Characters are dropped Per specification
Handling of invalid XML element names Variable Name is changed to "invalid" Per specification
Handling of invalid XML attribute names Variable Attribute is dropped Per specification
Handling of misnested tags Parent end tags always close children Parent end tags always close children Per specification
Handling of data between table cells Left as-is Left as-is Per specification
Handling of omitted start tags Elements are not inserted Elements are not inserted Per specification
Handling of processing instructions Processing instructions are retained Processing instructions are retained Per specification
Handling of bogus XLink namespace* Foreign content not supported XLink attributes are lost if preceded by bogus namespace Bogus namespace is ignored
Namespace for HTML elements Null Per specification, configurable Null
Time needed to parse single-page HTML specification 0.5 seconds 2.7 seconds† 6.0 seconds‡
Peak memory needed for same 11.6 MB 38 MB 13.9 MB

* For example: <svg xmlns:xlink='http://www.w3.org/1999/xhtml' xlink:href='http://example.com/'/>. It is unclear what correct behaviour is, but we believe our behaviour to be more consistent with the intent of the specification.

† With HTML namespace disabled. With HTML namespace enabled it does not finish in a reasonable time due to a PHP bug.

‡ With parse errors suppressed. Reporting parse errors adds approximately 10% overhead.