@ -8,36 +8,51 @@ A modern, accurate HTML parser and serializer for PHP.
```php
public static MensBeam\HTML\Parser::parse(
string $data,
?string $encodingOrContentType = null.
string $data,
?string $encodingOrContentType = null.
?MensBeam\HTML\Parser\Config $config = null
): MensBeam\HTML\Parser\Output
```
The `MensBeam\HTML\Parser::parse` static method is used to parse documents. An arbitrary string (and optional encoding) are taken as input, and a `MensBeam\HTML\Parser\Output` object is returned as output. The `Output` object has the following properties:
- `document`: A `DOMDocument` object representing the parsed document
- `documentClass`: A string`DOMDocument` object representing the parsed document
- `encoding`: The original character encoding of the document, as supplied by the user or otherwise detected during parsing
- `quirksMode`: The detected "quirks mode" property of the document. This will be one of `Parser::NO_QURIKS_MODE` (`0`), `Parser::QUIRKS_MODE` (`1`), or `Parser::LIMITED_QUIRKS_MODE` (`2`)
- `errors`: An array containing the list of parse errors emitted during processing if parse error reporting was turned on (see **Configuration** below), or `null` otherwise
Extra configuration parameters may be given to the parser by passing a `MensBeam\HTML\Parser\Config` object as the final `$config` argument. See the **Configuration** section below for more details.
### Parsing into existing documents
```php
public static MensBeam\HTML\Parser::parseInto(
string $data,
\DOMDocument $document,
?string $encodingOrContentType = null.
?MensBeam\HTML\Parser\Config $config = null
): MensBeam\HTML\Parser\Output
```
The `MensBeam\HTML\Parser::parseInto` static method is used to parse into an existing document. The supplied document must be an instance of (or derived from) `\DOMDocument` and also must be empty. All other arguments are identical to those used when parsing documents normally.
*NOTE:* The `documentClass` configuration option has no effect when using this method.
### Parsing fragments
```php
public static MensBeam\HTML\Parser::parse(
DOMElement $contextElement,
int $quirksMode,
string $data,
?string $encodingOrContentType = null.
string $data,
?string $encodingOrContentType = null.
?MensBeam\HTML\Parser\Config $config = null
): DOMDocumentFragment
```
The `MensBeam\HTML\Parser::parseFragment` static method is used to parse document fragments. The primary use case for this method is in the implementation of the `innerHTML` setter of HTML elements. Consequently a context element is required, as well as the "quirks mode" property of the context element's document (which must be one of `Parser::NO_QURIKS_MODE` (`0`), `Parser::QUIRKS_MODE` (`1`), or `Parser::LIMITED_QUIRKS_MODE` (`2`)). The further arguments are identical to those used when parsing documents.
If the "quirks mode" property of the document is not know, using `Parser::NO_QUIRKS_MODE` (`0`) is usually the best choice.
If the "quirks mode" property of the document is not known, using `Parser::NO_QUIRKS_MODE` (`0`) is usually the best choice.
Unlike the `parse()` method, the `parseFragment()` method returns a `DOMDocumentFragment` object belonging to `$contextElement`'s owner document.
@ -41,17 +41,17 @@ class Parser extends Serializer {
];
/** Parses a string to produce a document object
*
*
* @param string $data The string to parse. This may be in any valid encoding
* @param string|null $encodingOrContentType The document encoding, or HTTP Content-Type header value, if known. If no provided encoding detection will be attempted
* @param \MensBeam\HTML\Parser\Config|null $config The configuration parameters to use, if any
*/
public static function parse(string $data, ?string $encodingOrContentType = null, ?Config $config = null): Output {
return static::parseDocumentOrFragment($data, $encodingOrContentType, null, null, $config ?? new Config);
return static::parseDocumentOrFragment($data, $encodingOrContentType, null, null, null, $config ?? new Config);
}
/** Parses a string to produce a partial document (a document fragment)
*
*
* @param \DOMElement $contextElement The context element. The fragment will be pparsed as if it is a collection of children of this element
* @param int|null $quirksMode The "quirks mode" property of the context element's document. Must be one of Parser::NO_QUIRKS_MODE, Parser::LIMITED_QUIRKS_MODE, or Parser::QUIRKS_MODE
* @param string $data The string to parse. This may be in any valid encoding
@ -60,7 +60,7 @@ class Parser extends Serializer {
*/
public static function parseFragment(\DOMElement $contextElement, ?int $quirksMode, string $data, ?string $encodingOrContentType = null, ?Config $config = null): \DOMDocumentFragment {
// parse the fragment into a temporary document
$out = self::parseDocumentOrFragment($data, $encodingOrContentType, $contextElement, $quirksMode, $config ?? new Config);
throw new Exception(Exception::FAILED_CREATING_DOCUMENT, [$config->documentClass], $e);
}
if (!$document instanceof \DOMDocument) {
throw new Exception(Exception::INVALID_DOCUMENT_CLASS, [get_class($document)]);
/** Parses a string into an existing document object
*
* @param string $data The string to parse. This may be in any valid encoding
* @param \DOMDocument $document The document to parse into. Must be an instance of or derived from \DOMDocument and must be empty
* @param string|null $encodingOrContentType The document encoding, or HTTP Content-Type header value, if known. If no provided encoding detection will be attempted
* @param \MensBeam\HTML\Parser\Config|null $config The configuration parameters to use, if any
*/
public static function parseInto(string $data, \DOMDocument $document, ?string $encodingOrContentType = null, ?Config $config = null): Output {
return static::parseDocumentOrFragment($data, $encodingOrContentType, $document, null, null, $config ?? new Config);