• Originally I had a concept of a readonly node tree for grammars with nodes owning other nodes thinking it would be necessary when tokenizing. It isn't, so they're more trouble than they're worth.
• "ownership" in Grammar\Reference objects is handled by an ownerGrammarScopeName property which is then used to get the grammar from the GrammarRegistry.
• Added pattern match anchor support.
• Data is now an instanced class with support only for string input.
• Data now has firstLine, lastLine, and lastLineBeforeFinalNewLine properties to facilitate anchoring
• Highlight now has a static toDOM method for highlighting to a DOM tree instead of the withFile and withString methods for accepting different kinds of input
• Tokenizer now only outputs newline tokens if not the last line
• Tokenizer now throws out pattern match regexes if their anchors are invalid for the current line.
• Tokenizer now won't mistakenly emit empty string tokens.
• Before the first pattern's regex to match the line would be processed into tokens. This apparently is incorrect. Instead, the pattern regex that has an offset that is closest to the offset wins. Changes reflect this.
• When parsing JSON grammars match regexes now only escape unescaped
forward slashes.
• When parsing JSON grammars match regexes now truncate unicode
character codes larger than 0x10ffff to 0x10ffff, the largest possible
unicode character.
• Content names should only be applied to what is between begin/end
patterns. Might need to fix to not apply to end patterns themselves.
• Added a flag for begin patterns
• Trying to handle begin/end patterns better. Begin patterns shouldn't automatically remove themselves from the stack, their corresponding end pattern should instead.
• Added preliminary transformation of out-of-range codepoints in matches
• Fixed adoption of Grammar\Pattern objects.
• Fixed retrieval of Grammar\RepositoryReferences.
• Lines are now converted to UTF-32 while tokenizing so that byte
offsets may be cleanly converted to character offsets
• Now when grammars are parsed into Grammar objects begin and end
matches are converted to regular matches by adding end matches to the
pattern's pattern list to simplify tokenization.
• Highlight::withFile and Highlight::withString now accept an encoding
parameter which defaults to UTF-8.