diff --git a/tests/cases/serializer/README.md b/tests/cases/serializer/README.md index 25e9326..824bf7e 100644 --- a/tests/cases/serializer/README.md +++ b/tests/cases/serializer/README.md @@ -1,7 +1,9 @@ HTML DOM serialization tests ============================ -The format of these tests is essentially the format of html5lib's tree construction tests in reverse. There are, however, important differences, so the format is documented in full here. +The format of these tests is essentially the format of html5lib's tree +construction tests in reverse. There are, however, important differences, +so the format is documented in full here. Each file containing tree construction tests consists of any number of tests separated by two newlines (LF) and a single newline before the end @@ -15,9 +17,83 @@ of the file. For instance: Where [TEST] is the following format: -Each test begins with a line reading "#document" or "#fragment"; subsequent +Each test begins with a line reading `#document` or `#fragment`; subsequent lines represent the document or document fragment (respectively) used as -input, until a line is encountered which reads "#output", "#script-on", -or "#script-off". +input, until a line is encountered which reads `#output`, `#script-on`, +or `#script-off`. +Each DOM node in the input is written on its own line beginning with the +characters "| " (a vertical bar followed by a single space); lines which begin +with other characters are a continuation of the previous line. Attributes +are treated as distinct nodes and have their own entries. There is no escape +mechanism: all input is literal, including newlines and quotation marks. Two +spaces are used to denote each level of nesting. For example: + | node + | child node + continuation of child node + | grandchild node + | child node + | attribute node of child + | grandchild node + +The different types of nodes are: + +- Element nodes in the form `` for an element in the HTML namespace, + or `` for an element in a foreign namespace. Qualified names are + written as usual e.g. ``, though such elements are not + produced by the parser +- Attribute nodes in the form `id="value"` or e.g. `xml xml:id="value"`, with + a quotation mark immediately followed by a newline marking the end of the + attribute value (in other words, attribute values may contain literal + quotation marks) +- Text nodes in the form `"text data"`; like attributes, only a quotation mark + followed a newline marks the end of text data +- Comment nodes of the form ``; the space characters are + padding and are not part of the comment data +- Document type nodes in the form ``, or + `` or simply `` depending on its contents +- Processing instructions in the form ``. Processing + instructions are not generated by the HTML parser, but may appear in + documents by other means + +Namespaces are represented by the following short names: + +| Name | URL | +|-------|--------------------------------------| +| xml | http://www.w3.org/XML/1998/namespace | +| xmlns | http://www.w3.org/2000/xmlns/ | +| xlink | http://www.w3.org/1999/xlink | +| math | http://www.w3.org/1998/Math/MathML | +| svg | http://www.w3.org/2000/svg | + +Other namespaces may also appear; these should be interpreted as literal URLs. + +After the input block either `#script-on` or `#script-off` may appear. These +signal that the test should be run with scripting on or off, respectively. If +neither line is present, the test should be run in both modes. + +Finally, `#output` marks the beginning of output. All subsequent text is +literal characters until two consecutive newlines following by either +`#document` or `#fragment` are seen. + +Below is a complete example: + + #document + | + | + | + | lang="en" + | + | + | style="font-family: "Times New Roman"" + | + | xml xml:id="image" + |
+ | "This is a text node. + It has an embedded newline. It is in fact pretty "busy" and has + multiple newlines. + + And even a blank line." + |