• When calculating the offset after handling overlapping tokens it now aware of invalid capture offsets (meaning they matched nothing).
• Tokenizer::tokenizeLine now correctly does not continue looking for new matches when the newly tokenized pattern was an end pattern.
• Grammars no longer have beginCaptures incorrectly applied to end patterns.
• Originally I had a concept of a readonly node tree for grammars with nodes owning other nodes thinking it would be necessary when tokenizing. It isn't, so they're more trouble than they're worth.
• "ownership" in Grammar\Reference objects is handled by an ownerGrammarScopeName property which is then used to get the grammar from the GrammarRegistry.
• When parsing JSON grammars match regexes now only escape unescaped
forward slashes.
• When parsing JSON grammars match regexes now truncate unicode
character codes larger than 0x10ffff to 0x10ffff, the largest possible
unicode character.
• Content names should only be applied to what is between begin/end
patterns. Might need to fix to not apply to end patterns themselves.
• Added a flag for begin patterns
• Trying to handle begin/end patterns better. Begin patterns shouldn't automatically remove themselves from the stack, their corresponding end pattern should instead.
• Added preliminary transformation of out-of-range codepoints in matches
• Fixed adoption of Grammar\Pattern objects.
• Fixed retrieval of Grammar\RepositoryReferences.
• Lines are now converted to UTF-32 while tokenizing so that byte
offsets may be cleanly converted to character offsets
• Now when grammars are parsed into Grammar objects begin and end
matches are converted to regular matches by adding end matches to the
pattern's pattern list to simplify tokenization.
• Highlight::withFile and Highlight::withString now accept an encoding
parameter which defaults to UTF-8.