Browse Source

Started adding Document::adoptNode

wrapper-classes
Dustin Wilson 3 years ago
parent
commit
8f471ef430
  1. 5
      README.md
  2. 99
      lib/Document.php
  3. 29
      tests/cases/TestDocument.php

5
README.md

@ -65,8 +65,9 @@ The primary aim of this library is accuracy. If the document model differs from
3. While `DOMDocumentType` can be extended and registered by PHP's `DOMDocument::registerNodeClass` `DOMImplementation` cannot; this means that doctypes created with `DOMImplementation::createDocumentType` can't ever be a registered class. Therefore, doctypes remain as `DOMDocumentType` in this library and retain the same limitations as ones in PHP's DOM.
4. The DOM specification mentions that [`HTMLCollection`][a] has to be kept around for backwards compatibility in browsers, but any new implementations should use [`sequence<T>`][b] instead which is essentially just a typed array object of some kind. Any methods should also return a copy of an object instead of a reference to the platform object, meaning the bane of any web developer's existence -- live lists -- shouldn't be in any new additions to the DOM. Since this implementation is not a fully userland PHP implementation of the DOM but instead an extension of it, this implementation will use `DOMNodeList` where the specification says to use an `HTMLCollection` and an array where the specification says to use a `sequence<T>`. In addition, if the specification states to return a static `NodeList` this implementation will use `MensBeam\HTML\DOM\NodeList` instead; this is because `DOMNodeList` is always live in PHP.
5. Aside from `HTMLTemplateElement` there are no other specific element classes such as `HTMLAnchorElement`, `HTMLDivElement`, etc. and therefore are no DOM methods and properties that are specific to those elements. Implementing them is possible, but we weighed it against its utility as each specific element slows down the DOM seemingly exponentially especially when parsing serialized HTML because each element has to be converted to the specific variety manually and recursively. For instance, when parsing the [WHATWG's single page HTML specification][d] (which is an absurdly enormous HTML document on the very edge of what we should be able to parse) in our tests it takes around 6.5 seconds; with specific element classes it instead takes *15 minutes*. [`phpgt/dom`][c] mitigates this by only converting when querying for elements, but it's still slow. We decided not to go this route.
6. This implementation will not implement the `NodeIterator` and `TreeWalker` APIs. They are horribly conceived and impractical APIs that few people actually use because it's literally easier to write recursive loops to walk through the DOM than it is to use those APIs. They have instead been replaced with the `ChildNode::moonwalk`, `ParentNode::walk`, `ChildNode::walkFollowing`, and `ChildNode::walkPreceding` generators.
7. Readonly properties inherited from PHP DOM cannot be overridden in this implementation and therefore might produce incorrect data. In many cases an additional standard property exists, but in most cases the property is simply useless for HTML so does absolutely nothing. Below are the properties that will show invalid or useless data along with suggested replacements if any:
6. PHP's DOM has an `DOMDocument::adoptNode` method, but it returns an error saying it isn't implemented. `Document::adoptNode` doesn't work exactly like the specification because we cannot override the signature from the original method to make the `$node` argument a reference so that the original object variable is replaced, too. Otherwise, it works as it should; just be mindful of this unfortunate difference.
7. This implementation will not implement the `NodeIterator` and `TreeWalker` APIs. They are horribly conceived and impractical APIs that few people actually use because it's literally easier to write recursive loops to walk through the DOM than it is to use those APIs. They have instead been replaced with the `ChildNode::moonwalk`, `ParentNode::walk`, `ChildNode::walkFollowing`, and `ChildNode::walkPreceding` generators.
8. Readonly properties inherited from PHP DOM cannot be overridden in this implementation and therefore might produce incorrect data. In many cases an additional standard property exists, but in most cases the property is simply useless for HTML so does absolutely nothing. Below are the properties that will show invalid or useless data along with suggested replacements if any:
<table>
<thead>

99
lib/Document.php

@ -143,13 +143,13 @@ class Document extends \DOMDocument implements Node {
public function __construct(\DOMDocument|string|null $source = null, ?string $encoding = null) {
parent::__construct();
parent::registerNodeClass('DOMAttr', '\MensBeam\HTML\DOM\Attr');
parent::registerNodeClass('DOMDocument', '\MensBeam\HTML\DOM\Document');
parent::registerNodeClass('DOMComment', '\MensBeam\HTML\DOM\Comment');
parent::registerNodeClass('DOMDocumentFragment', '\MensBeam\HTML\DOM\DocumentFragment');
parent::registerNodeClass('DOMElement', '\MensBeam\HTML\DOM\Element');
parent::registerNodeClass('DOMProcessingInstruction', '\MensBeam\HTML\DOM\ProcessingInstruction');
parent::registerNodeClass('DOMText', '\MensBeam\HTML\DOM\Text');
parent::registerNodeClass('DOMAttr', Attr::class);
parent::registerNodeClass('DOMDocument', self::class);
parent::registerNodeClass('DOMComment', Comment::class);
parent::registerNodeClass('DOMDocumentFragment', DocumentFragment::class);
parent::registerNodeClass('DOMElement', Element::class);
parent::registerNodeClass('DOMProcessingInstruction', ProcessingInstruction::class);
parent::registerNodeClass('DOMText', Text::class);
if ($source !== null) {
if (is_string($source)) {
@ -163,6 +163,44 @@ class Document extends \DOMDocument implements Node {
}
public function adoptNode(\DOMNode $node): Node|\DOMDocumentType|null {
# The adoptNode(node) method steps are:
#
# 1. If node is a document, then throw a "NotSupportedError" DOMException.
if ($node instanceof \DOMDocument) {
throw new DOMException(DOMException::NOT_SUPPORTED);
}
# 2. If node is a shadow root, then throw a "HierarchyRequestError" DOMException.
// DEVIATION: There is no scripting in this implementation.
# 3. If node is a DocumentFragment node whose host is non-null, then return.
if ($node instanceof DocumentFragment && $node->host !== null) {
return null;
}
# 4. Adopt node into this.
# To adopt a node into a document, run these steps:
#
# 1. Let oldDocument be node’s node document.
$oldDocument = $node->ownerDocument;
# 2. If node’s parent is non-null, then remove node.
if ($node->parentNode !== null) {
$node->parentNode->removeChild($node);
}
# 3. If document is not oldDocument, then:
if ($this !== $oldDocument) {
// DEVIATION: Steps 1 & 2 of the sub algorithm here all have to do with scripting
# 3. For each inclusiveDescendant in node’s shadow-including inclusive descendants,
# in shadow-including tree order, run the adopting steps with inclusiveDescendant
# and oldDocument.
$node = $this->importNode($node, true);
}
# 5. Return node.
return $this->convertAdoptedImportedNodes($node);
}
public function createAttribute(string $localName): ?Attr {
# The createAttribute(localName) method steps are:
# 1. If localName does not match the Name production in XML, then throw an
@ -389,27 +427,7 @@ class Document extends \DOMDocument implements Node {
}
$node = parent::importNode($node, $deep);
if ($node instanceof Element || $node instanceof DocumentFragment) {
// Yet another PHP DOM hang-up that is either a bug or a feature. When
// elements are imported their id attributes aren't able to be picked up by
// NonElementParentNode::getElementById, so let's fix that.
$elementsWithIds = $node->walk(function($n) {
return ($n instanceof Element && $n->hasAttribute('id'));
}, true);
foreach ($elementsWithIds as $e) {
$e->setIdAttributeNode($e->getAttributeNode('id'), true);
}
if ($node instanceof Element && !$node instanceof HTMLTemplateElement && $this->isHTMLNamespace($node) && strtolower($node->nodeName) === 'template') {
$node = $this->convertTemplate($node);
} else {
$this->replaceTemplates($node);
}
}
return $node;
return $this->convertAdoptedImportedNodes($node);
}
public function load(string $filename, $options = null, ?string $encoding = null): bool {
@ -433,7 +451,7 @@ class Document extends \DOMDocument implements Node {
}
}
}
if ($wrapperType === 'plainfile') {
$filename = realpath($filename);
$this->_URL = "file://$filename";
@ -942,6 +960,29 @@ class Document extends \DOMDocument implements Node {
}
private function convertAdoptedImportedNodes(\DOMDocumentType|Node $node): \DOMDocumentType|Node {
if ($node instanceof Element || $node instanceof DocumentFragment) {
// Yet another PHP DOM hang-up that is either a bug or a feature. When
// elements are imported their id attributes aren't able to be picked up by
// NonElementParentNode::getElementById, so let's fix that.
$elementsWithIds = $node->walk(function($n) {
return ($n instanceof Element && $n->hasAttribute('id'));
}, true);
foreach ($elementsWithIds as $e) {
$e->setIdAttributeNode($e->getAttributeNode('id'), true);
}
if ($node instanceof Element && !$node instanceof HTMLTemplateElement && $this->isHTMLNamespace($node) && strtolower($node->nodeName) === 'template') {
$node = $this->convertTemplate($node);
} else {
$this->replaceTemplates($node);
}
}
return $node;
}
private function convertTemplate(\DOMElement $element): \DOMElement {
if ($this->isHTMLNamespace($element) && strtolower($element->nodeName) === 'template') {
$template = $this->createElement($element->nodeName);

29
tests/cases/TestDocument.php

@ -22,6 +22,27 @@ use MensBeam\HTML\Parser,
/** @covers \MensBeam\HTML\DOM\Document */
class TestDocument extends \PHPUnit\Framework\TestCase {
/** @covers \MensBeam\HTML\DOM\Document::adoptNode */
public function testAdoptNode() {
$d = new Document();
$t = $d->createElement('template');
$d2 = new Document();
$t2 = $d2->adoptNode($t, true);
$this->assertSame($d2, $t2->ownerDocument);
$this->assertNull($t->parentNode);
/*$d = new \DOMDocument();
$t = $d->createElement('template');
// Add a child template to cover recursive template conversions.
$t->appendChild($d->createElement('template'));
$this->assertSame(\DOMElement::class, $t::class);
$d2 = new Document();
$t2 = $d2->importNode($t, true);
$this->assertSame(HTMLTemplateElement::class, $t2::class);*/
}
public function provideAttributeNodeCreation(): iterable {
return [
[ 'test', 'test' ],
@ -341,24 +362,24 @@ class TestDocument extends \PHPUnit\Framework\TestCase {
/** @covers \MensBeam\HTML\DOM\Document::importNode */
public function testImportingNodes() {
public function testImportNode() {
$d = new Document();
$t = $d->createElement('template');
$d2 = new Document();
$t2 = $d2->importNode($t, true);
$this->assertFalse($t2->ownerDocument->isSameNode($t->ownerDocument));
$this->assertSame(get_class($t2), get_class($t));
$this->assertSame($t2::class, $t::class);
$d = new \DOMDocument();
$t = $d->createElement('template');
// Add a child template to cover recursive template conversions.
$t->appendChild($d->createElement('template'));
$this->assertSame(\DOMElement::class, get_class($t));
$this->assertSame(\DOMElement::class, $t::class);
$d2 = new Document();
$t2 = $d2->importNode($t, true);
$this->assertSame(HTMLTemplateElement::class, get_class($t2));
$this->assertSame(HTMLTemplateElement::class, $t2::class);
}

Loading…
Cancel
Save