|
|
|
HTML DOM serialization tests
|
|
|
|
============================
|
|
|
|
|
|
|
|
The format of these tests is essentially the format of html5lib's tree
|
|
|
|
construction tests in reverse. There are, however, important differences,
|
|
|
|
so the format is documented in full here.
|
|
|
|
|
|
|
|
Each file containing tree construction tests consists of any number of
|
|
|
|
tests separated by two newlines (LF) and a single newline before the end
|
|
|
|
of the file. For instance:
|
|
|
|
|
|
|
|
[TEST]LF
|
|
|
|
LF
|
|
|
|
[TEST]LF
|
|
|
|
LF
|
|
|
|
[TEST]LF
|
|
|
|
|
|
|
|
Where [TEST] is the following format:
|
|
|
|
|
|
|
|
Each test begins with a line reading `#document` or `#fragment`; subsequent
|
|
|
|
lines represent the document or document fragment (respectively) used as
|
|
|
|
input, until a line is encountered which reads `#output`, `#script-on`,
|
|
|
|
or `#script-off`.
|
|
|
|
|
|
|
|
Each DOM node in the input is written on its own line beginning with the
|
|
|
|
characters "| " (a vertical bar followed by a single space); lines which begin
|
|
|
|
with other characters are a continuation of the previous line. Attributes
|
|
|
|
are treated as distinct nodes and have their own entries. There is no escape
|
|
|
|
mechanism: all input is literal, including newlines and quotation marks. Two
|
|
|
|
spaces are used to denote each level of nesting. For example:
|
|
|
|
|
|
|
|
| node
|
|
|
|
| child node
|
|
|
|
continuation of child node
|
|
|
|
| grandchild node
|
|
|
|
| child node
|
|
|
|
| attribute node of child
|
|
|
|
| grandchild node
|
|
|
|
|
|
|
|
The different types of nodes are:
|
|
|
|
|
|
|
|
- Element nodes in the form `<body>` for an element in the HTML namespace,
|
|
|
|
or `<svg svg>` for an element in a foreign namespace. Qualified names are
|
|
|
|
written as usual e.g. `<math math:math>`, though such elements are not
|
|
|
|
produced by the parser
|
|
|
|
- Attribute nodes in the form `id="value"` or e.g. `xml xml:id="value"`, with
|
|
|
|
a quotation mark immediately followed by a newline marking the end of the
|
|
|
|
attribute value (in other words, attribute values may contain literal
|
|
|
|
quotation marks)
|
|
|
|
- Text nodes in the form `"text data"`; like attributes, only a quotation mark
|
|
|
|
followed a newline marks the end of text data
|
|
|
|
- Comment nodes of the form `<!-- comment data -->`; the space characters are
|
|
|
|
padding and are not part of the comment data
|
|
|
|
- Document type nodes in the form `<!DOCTYPE html "public" "system">`, or
|
|
|
|
`<!DOCTYPE html>` or simply `<!DOCTYPE>` depending on its contents
|
|
|
|
- Processing instructions in the form `<?target PI data>`. Processing
|
|
|
|
instructions are not generated by the HTML parser, but may appear in
|
|
|
|
documents by other means
|
|
|
|
|
|
|
|
Namespaces are represented by the following short names:
|
|
|
|
|
|
|
|
| Name | URL |
|
|
|
|
|-------|--------------------------------------|
|
|
|
|
| xml | http://www.w3.org/XML/1998/namespace |
|
|
|
|
| xmlns | http://www.w3.org/2000/xmlns/ |
|
|
|
|
| xlink | http://www.w3.org/1999/xlink |
|
|
|
|
| math | http://www.w3.org/1998/Math/MathML |
|
|
|
|
| svg | http://www.w3.org/2000/svg |
|
|
|
|
|
|
|
|
Other namespaces may also appear; these should be interpreted as literal URLs.
|
|
|
|
|
|
|
|
After the input block either `#script-on` or `#script-off` may appear. These
|
|
|
|
signal that the test should be run with scripting on or off, respectively. If
|
|
|
|
neither line is present, the test should be run in both modes.
|
|
|
|
|
|
|
|
Finally, `#output` marks the beginning of output. All subsequent text is
|
|
|
|
literal characters until two consecutive newlines following by either
|
|
|
|
`#document` or `#fragment` are seen.
|
|
|
|
|
|
|
|
Below is a complete example:
|
|
|
|
|
|
|
|
#document
|
|
|
|
| <!-- This is longer than most tests -->
|
|
|
|
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
|
|
|
| <html>
|
|
|
|
| lang="en"
|
|
|
|
| <head>
|
|
|
|
| <body>
|
|
|
|
| style="font-family: "Times New Roman""
|
|
|
|
| <svg svg>
|
|
|
|
| xml xml:id="image"
|
|
|
|
| <div>
|
|
|
|
| "This is a text node.
|
|
|
|
It has an embedded newline. It is in fact pretty "busy" and has
|
|
|
|
multiple newlines.
|
|
|
|
|
|
|
|
And even a blank line."
|
|
|
|
| <!-- This comment also
|
|
|
|
has a newline -->
|