Browse Source

Flesh documentation for serializer test format

The example is still incomplete
serialize
J. King 3 years ago
parent
commit
2889107844
  1. 84
      tests/cases/serializer/README.md

84
tests/cases/serializer/README.md

@ -1,7 +1,9 @@
HTML DOM serialization tests
============================
The format of these tests is essentially the format of html5lib's tree construction tests in reverse. There are, however, important differences, so the format is documented in full here.
The format of these tests is essentially the format of html5lib's tree
construction tests in reverse. There are, however, important differences,
so the format is documented in full here.
Each file containing tree construction tests consists of any number of
tests separated by two newlines (LF) and a single newline before the end
@ -15,9 +17,83 @@ of the file. For instance:
Where [TEST] is the following format:
Each test begins with a line reading "#document" or "#fragment"; subsequent
Each test begins with a line reading `#document` or `#fragment`; subsequent
lines represent the document or document fragment (respectively) used as
input, until a line is encountered which reads "#output", "#script-on",
or "#script-off".
input, until a line is encountered which reads `#output`, `#script-on`,
or `#script-off`.
Each DOM node in the input is written on its own line beginning with the
characters "| " (a vertical bar followed by a single space); lines which begin
with other characters are a continuation of the previous line. Attributes
are treated as distinct nodes and have their own entries. There is no escape
mechanism: all input is literal, including newlines and quotation marks. Two
spaces are used to denote each level of nesting. For example:
| node
| child node
continuation of child node
| grandchild node
| child node
| attribute node of child
| grandchild node
The different types of nodes are:
- Element nodes in the form `<body>` for an element in the HTML namespace,
or `<svg svg>` for an element in a foreign namespace. Qualified names are
written as usual e.g. `<math math:math>`, though such elements are not
produced by the parser
- Attribute nodes in the form `id="value"` or e.g. `xml xml:id="value"`, with
a quotation mark immediately followed by a newline marking the end of the
attribute value (in other words, attribute values may contain literal
quotation marks)
- Text nodes in the form `"text data"`; like attributes, only a quotation mark
followed a newline marks the end of text data
- Comment nodes of the form `<!-- comment data -->`; the space characters are
padding and are not part of the comment data
- Document type nodes in the form `<!DOCTYPE html "public" "system">`, or
`<!DOCTYPE html>` or simply `<!DOCTYPE>` depending on its contents
- Processing instructions in the form `<?target PI data>`. Processing
instructions are not generated by the HTML parser, but may appear in
documents by other means
Namespaces are represented by the following short names:
| Name | URL |
|-------|--------------------------------------|
| xml | http://www.w3.org/XML/1998/namespace |
| xmlns | http://www.w3.org/2000/xmlns/ |
| xlink | http://www.w3.org/1999/xlink |
| math | http://www.w3.org/1998/Math/MathML |
| svg | http://www.w3.org/2000/svg |
Other namespaces may also appear; these should be interpreted as literal URLs.
After the input block either `#script-on` or `#script-off` may appear. These
signal that the test should be run with scripting on or off, respectively. If
neither line is present, the test should be run in both modes.
Finally, `#output` marks the beginning of output. All subsequent text is
literal characters until two consecutive newlines following by either
`#document` or `#fragment` are seen.
Below is a complete example:
#document
| <!-- This is longer than most tests -->
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
| <html>
| lang="en"
| <head>
| <body>
| style="font-family: "Times New Roman""
| <svg svg>
| xml xml:id="image"
| <div>
| "This is a text node.
It has an embedded newline. It is in fact pretty "busy" and has
multiple newlines.
And even a blank line."
| <!-- This comment also
has a newline -->

Loading…
Cancel
Save