• I -really- hate debugging this because there's no reference to go by to ensure things are correct except trial and error.
• Sometimes when resolving scope names the wrong match would end up in the name.
• Because of how references are handled in this implementation sometimes there'd be a leftover pattern containing a single reference when popping off the rule and scope stacks. It would cause havoc, so a bit of bullshit is needed to circumvent that. Probably can simplify it in the future because checking against the end pattern like it is probably isn't necessary, but this works at present.
• When calculating the offset after handling overlapping tokens it now aware of invalid capture offsets (meaning they matched nothing).
• Tokenizer::tokenizeLine now correctly does not continue looking for new matches when the newly tokenized pattern was an end pattern.
• Grammars no longer have beginCaptures incorrectly applied to end patterns.
• Originally I had a concept of a readonly node tree for grammars with nodes owning other nodes thinking it would be necessary when tokenizing. It isn't, so they're more trouble than they're worth.
• "ownership" in Grammar\Reference objects is handled by an ownerGrammarScopeName property which is then used to get the grammar from the GrammarRegistry.
• When parsing JSON grammars match regexes now only escape unescaped
forward slashes.
• When parsing JSON grammars match regexes now truncate unicode
character codes larger than 0x10ffff to 0x10ffff, the largest possible
unicode character.
• Content names should only be applied to what is between begin/end
patterns. Might need to fix to not apply to end patterns themselves.
• Added a flag for begin patterns
• Trying to handle begin/end patterns better. Begin patterns shouldn't automatically remove themselves from the stack, their corresponding end pattern should instead.
• Added preliminary transformation of out-of-range codepoints in matches
• Fixed adoption of Grammar\Pattern objects.
• Fixed retrieval of Grammar\RepositoryReferences.
• Lines are now converted to UTF-32 while tokenizing so that byte
offsets may be cleanly converted to character offsets
• Now when grammars are parsed into Grammar objects begin and end
matches are converted to regular matches by adding end matches to the
pattern's pattern list to simplify tokenization.
• Highlight::withFile and Highlight::withString now accept an encoding
parameter which defaults to UTF-8.