Flesh documentation for serializer test format

The example is still incomplete
3 years ago · 2889107844
1 changed files with 80 additions and 4 deletions
--- a/tests/cases/serializer/README.md
+++ b/tests/cases/serializer/README.md
@ -1,7 +1,9 @@
 HTML DOM serialization tests
 ============================

-The format of these tests is essentially the format of html5lib's tree construction tests in reverse. There are, however, important differences, so the format is documented in full here.
+The format of these tests is essentially the format of html5lib's tree
+construction tests in reverse. There are, however, important differences,
+so the format is documented in full here.

 Each file containing tree construction tests consists of any number of
 tests separated by two newlines (LF) and a single newline before the end
@ -15,9 +17,83 @@ of the file. For instance:

 Where [TEST] is the following format:

-Each test begins with a line reading "#document" or "#fragment"; subsequent
+Each test begins with a line reading `#document` or `#fragment`; subsequent
 lines represent the document or document fragment (respectively) used as
-input, until a line is encountered which reads "#output", "#script-on",
-or "#script-off".
+input, until a line is encountered which reads `#output`, `#script-on`,
+or `#script-off`.

+Each DOM node in the input is written on its own line beginning with the
+characters "| " (a vertical bar followed by a single space); lines which begin
+with other characters are a continuation of the previous line. Attributes
+are treated as distinct nodes and have their own entries. There is no escape
+mechanism: all input is literal, including newlines and quotation marks. Two
+spaces are used to denote each level of nesting. For example:

+    | node
+    |   child node
+    continuation of child node
+    |     grandchild node
+    |   child node
+    |     attribute node of child
+    |     grandchild node
+
+The different types of nodes are:
+
+- Element nodes in the form `<body>` for an element in the HTML namespace,
+  or `<svg svg>` for an element in a foreign namespace. Qualified names are
+  written as usual e.g. `<math math:math>`, though such elements are not 
+  produced by the parser
+- Attribute nodes in the form `id="value"` or e.g. `xml xml:id="value"`, with
+  a quotation mark immediately followed by a newline marking the end of the
+  attribute value (in other words, attribute values may contain literal
+  quotation marks)
+- Text nodes in the form `"text data"`; like attributes, only a quotation mark
+  followed a newline marks the end of text data
+- Comment nodes of the form `<!-- comment data -->`; the space characters are
+  padding and are not part of the comment data
+- Document type nodes in the form `<!DOCTYPE html "public" "system">`, or
+  `<!DOCTYPE html>` or simply `<!DOCTYPE>` depending on its contents
+- Processing instructions in the form `<?target PI data>`. Processing
+  instructions are not generated by the HTML parser, but may appear in
+  documents by other means
+
+Namespaces are represented by the following short names:
+
+| Name  | URL                                  |
+|-------|--------------------------------------|
+| xml   | http://www.w3.org/XML/1998/namespace |
+| xmlns | http://www.w3.org/2000/xmlns/        |
+| xlink | http://www.w3.org/1999/xlink         |
+| math  | http://www.w3.org/1998/Math/MathML   |
+| svg   | http://www.w3.org/2000/svg           |
+
+Other namespaces may also appear; these should be interpreted as literal URLs.
+
+After the input block either `#script-on` or `#script-off` may appear. These
+signal that the test should be run with scripting on or off, respectively. If
+neither line is present, the test should be run in both modes.
+
+Finally, `#output` marks the beginning of output. All subsequent text is
+literal characters until two consecutive newlines following by either
+`#document` or `#fragment` are seen.
+
+Below is a complete example:
+
+    #document
+    | <!-- This is longer than most tests -->
+    | <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
+    | <html>
+    |   lang="en"
+    |   <head>
+    |   <body>
+    |     style="font-family: "Times New Roman""
+    |     <svg svg>
+    |       xml xml:id="image"
+    |     <div>
+    |       "This is a text node.
+    It has an embedded newline. It is in fact pretty "busy" and has
+    multiple newlines.
+
+    And even a blank line."
+    |       <!-- This comment also
+    has a newline -->