Document DOMParser
This commit is contained in:
parent
6b863a1a85
commit
d499fac607
1 changed files with 22 additions and 2 deletions
24
README.md
24
README.md
|
@ -14,15 +14,35 @@ public static MensBeam\HTML\Parser::parse(
|
||||||
): MensBeam\HTML\Parser\Output
|
): MensBeam\HTML\Parser\Output
|
||||||
```
|
```
|
||||||
|
|
||||||
The `MensBeam\HTML\Parser::parse` static method is used to parse documents. An arbitrary string (and optional encoding) are taken as input, and a `MensBeam\HTML\Parser\Output` object is returned as output. The `Output` object has the following properties:
|
The `MensBeam\HTML\Parser::parse` static method is used to parse documents. An arbitrary string and optional encoding are taken as input, and a `MensBeam\HTML\Parser\Output` object is returned as output. The `Output` object has the following properties:
|
||||||
|
|
||||||
- `document`: A string `DOMDocument` object representing the parsed document
|
- `document`: A `DOMDocument` object representing the parsed document
|
||||||
- `encoding`: The original character encoding of the document, as supplied by the user or otherwise detected during parsing
|
- `encoding`: The original character encoding of the document, as supplied by the user or otherwise detected during parsing
|
||||||
- `quirksMode`: The detected "quirks mode" property of the document. This will be one of `Parser::NO_QURIKS_MODE` (`0`), `Parser::QUIRKS_MODE` (`1`), or `Parser::LIMITED_QUIRKS_MODE` (`2`)
|
- `quirksMode`: The detected "quirks mode" property of the document. This will be one of `Parser::NO_QURIKS_MODE` (`0`), `Parser::QUIRKS_MODE` (`1`), or `Parser::LIMITED_QUIRKS_MODE` (`2`)
|
||||||
- `errors`: An array containing the list of parse errors emitted during processing if parse error reporting was turned on (see **Configuration** below), or `null` otherwise
|
- `errors`: An array containing the list of parse errors emitted during processing if parse error reporting was turned on (see **Configuration** below), or `null` otherwise
|
||||||
|
|
||||||
Extra configuration parameters may be given to the parser by passing a `MensBeam\HTML\Parser\Config` object as the final `$config` argument. See the **Configuration** section below for more details.
|
Extra configuration parameters may be given to the parser by passing a `MensBeam\HTML\Parser\Config` object as the final `$config` argument. See the **Configuration** section below for more details.
|
||||||
|
|
||||||
|
### Parsing with `DOMParser`
|
||||||
|
|
||||||
|
Since version 1.3.0, the library also provides an implemention of [the `DOMParser` interface](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-parsing-and-serialization).
|
||||||
|
|
||||||
|
```php
|
||||||
|
class MensBeam\HTML\DOMParser {
|
||||||
|
public function parseFromString(
|
||||||
|
string $string,
|
||||||
|
string $type
|
||||||
|
): \DOMDocument
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Like the standard interface, it will parse either HTML or XML documents. This implementation does, however, differ in the following ways:
|
||||||
|
|
||||||
|
- Any XML MIME content-type (e.g. `application/rss+xml`) is acceptable, not just the restricted list mandated by the interface
|
||||||
|
- MIME content-types may include a `charset` parameter to specify an authoritative encoding of the document
|
||||||
|
- If no `charset` is provided encoding will be detected from document hints; the default encoding for HTML is `windows-1252` and for XML `UTF-8`
|
||||||
|
- `InvalidArgumentException` is thrown in place of JavaScript's `TypeError`
|
||||||
|
|
||||||
### Parsing into existing documents
|
### Parsing into existing documents
|
||||||
|
|
||||||
```php
|
```php
|
||||||
|
|
Loading…
Reference in a new issue