Browse Source

Add documentation

master 0.1.0
J. King 11 months ago
parent
commit
2613cb5802
  1. 71
      README.md
  2. 6
      lib/Microformats.php
  3. 2
      lib/Microformats/Parser.php

71
README.md

@ -1,11 +1,78 @@
# Microformats
A generic [Microformats](https://microformats.io/) parser for PHP. While it similar to [php-mf2](https://github.com/microformats/php-mf2), it combines a more accurate HTML parser with more consistent performance characteristics, and passes tests which the other library does not pass.
A generic [Microformats](https://microformats.io/) parser for PHP. While it similar to [php-mf2](https://github.com/microformats/php-mf2), it combines a more accurate HTML parser with more consistent performance characteristics, and it is believed to have fewer bugs.
## Usage
Functionality is provided for parsing from a file, from a string, and from an HTML element (a `\DOMElement` object), as well as for serializing to JSON.
Functionality is provided for parsing from an HTTP URL, from a file, from a string, and from an HTML element (a `\DOMElement` object), as well as for serializing to JSON. A static method of the `MensBeam\Microformats` is provided for each task.
The parsing methods all return a Microformats structure as an array. The [Microformats wiki](https://microformats.org/wiki/microformats2) includes some sample structures in JSON format.
### Parsing from a URL
```php
\MensBeam\Microformats::fromUrl(string $url, array $options = []): ?array
```
The `$url` argument is an HTTP(S) URL to an HTML resource; redirections will be followed if neceesary. If the resource cannot be fetched `null` will be returned.
The `$options` argument is a list of options for the Microformats parser. See below for details.
### Parsing from a file
```php
\MensBeam\Microformats::fromFile(string $file, string $contentType, string $url, array $options = []): ?array
```
The `$file` argument is the path to a local file. If the file cannot be opened for reading `null` will be returned.
The `$contentType` argument is a string containing the value of the file's HTTP `Content-Type` header, if known. This may be used to provide the HTML parser with character encoding information.
The `$url` argument is a string containing the file's effective URL. This is used to resolve any relative URLs in the input.
The `$options` argument is a list of options for the Microformats parser. See below for details.
### Parsing from a string
```php
\MensBeam\Microformats::fromString(string $input, string $contentType, string $url, array $options = []): array
```
The `$input` argument is the string to parse for micrformats.
The `$contentType` argument is a string containing the value of the string's HTTP `Content-Type` header, if known. This may be used to provide the HTML parser with character encoding information.
The `$url` argument is a string containing the string's effective URL. This is used to resolve any relative URLs in the input.
The `$options` argument is a list of options for the Microformats parser. See below for details.
### Parsing from an HTML element
```php
\MensBeam\Microformats::fromHTMLElement(\DOMElement $input, string $url, array $options = []): array
```
The `$input` argument is the element to parse for micrformats. Typically this would be the `documentElement`, but any element may be parsed.
The `$url` argument is a string containing the string's effective URL. This is used to resolve any relative URLs in the input.
The `$options` argument is a list of options for the Microformats parser. See below for details.
### Serializing to JSON
```php
\MensBeam\Microformats::toJSON(array $data, int $flags = 0, int $depth = 512): string
```
Since Microformats data is represented as a structure of nested arrays, some of which are associative ("objects" in JSON parlance) and may be empty, it is necessary to convert such empty array into PHP `stdClass` objects before they are serialized to JSON. This method performs these conversions before passing the result to [the `json_encode` function](https://www.php.net/manual/en/function.json-encode). Its parameters are the same as that of `json_encode`.
## Options
The parsing methods all optionally take an `$options` array as an argument. These options are all flags, either for experimental features, or for backwards-compatible features no longer used by default. The options are as followings:
| Key | Type | Default | Description
|--------------|---------|---------|------------
| `impliedTz` | Boolean | `false` | Time values in microformats may have an implied date associated with them taken from a prior date value in the same microformat structure. This option allows for a time zone to be implied as well, if a time does not include its time zone.
| `lang` | Boolean | `false` | This option determines whether language information is retrieved from the parsed document and included in the output, in `lang` keys. Both Microformat structures and embedded markup (`e-` property) structures are affected by this options.
| `simpleTrim` | Boolean | `false` | This option uses the "classic", simpler whitespace-trimming algorithm rather than the more aggressive one proposed for future standardization, and used by default for this algorithm. This affects both `p-` and `e-` properties.

6
lib/Microformats.php

@ -92,7 +92,7 @@ class Microformats {
*/
public static function fromString(string $input, string $contentType, string $url, array $options = []): array {
$parsed = HTMLParser::parse($input, $contentType);
return static::fromHTMLElement($parsed->document->documentElement, $url, $options);
return static::fromHtmlElement($parsed->document->documentElement, $url, $options);
}
/** Parses an HTML element for microformats
@ -101,8 +101,8 @@ class Microformats {
* @param string $url The effective URL (after redirections) of the document if known
* @param array $options Options for the parser; please see the class documentetation for details
*/
public static function fromHTMLElement(\DOMElement $input, string $url, array $options = []): array {
return (new MfParser)->parseHTMLElement($input, $url, $options);
public static function fromHtmlElement(\DOMElement $input, string $url, array $options = []): array {
return (new MfParser)->parseHtmlElement($input, $url, $options);
}
/** Serializes a Microformats structure to JSON.

2
lib/Microformats/Parser.php

@ -253,7 +253,7 @@ class Parser {
* @param string $baseURL The base URL against which to resolve relative URLs in the output
* @param array $options An associative array of options. Please see the class documentation for more details
*/
public function parseHTMLElement(\DOMElement $node, string $baseUrl = "", ?array $options = null): array {
public function parseHtmlElement(\DOMElement $node, string $baseUrl = "", ?array $options = null): array {
$root = $node;
// normalize options
$this->options = $this->normalizeOptions($options ?? []);

Loading…
Cancel
Save