• I -really- hate debugging this because there's no reference to go by to ensure things are correct except trial and error.
• Sometimes when resolving scope names the wrong match would end up in the name.
• Because of how references are handled in this implementation sometimes there'd be a leftover pattern containing a single reference when popping off the rule and scope stacks. It would cause havoc, so a bit of bullshit is needed to circumvent that. Probably can simplify it in the future because checking against the end pattern like it is probably isn't necessary, but this works at present.
• Fixed bug where nonexistent grammars would cause tokenizer to fail.
• Added mensbeam/html as a dependency, removed docopt/docopt and
• Discovered bug when injections are removed from the stack when
• When calculating the offset after handling overlapping tokens it now aware of invalid capture offsets (meaning they matched nothing).
• Tokenizer::tokenizeLine now correctly does not continue looking for new matches when the newly tokenized pattern was an end pattern.
• Grammars no longer have beginCaptures incorrectly applied to end patterns.
• Originally I had a concept of a readonly node tree for grammars with nodes owning other nodes thinking it would be necessary when tokenizing. It isn't, so they're more trouble than they're worth.
• "ownership" in Grammar\Reference objects is handled by an ownerGrammarScopeName property which is then used to get the grammar from the GrammarRegistry.
• Added pattern match anchor support.
• Data is now an instanced class with support only for string input.
• Data now has firstLine, lastLine, and lastLineBeforeFinalNewLine properties to facilitate anchoring
• Highlight now has a static toDOM method for highlighting to a DOM tree instead of the withFile and withString methods for accepting different kinds of input
• Tokenizer now only outputs newline tokens if not the last line
• Tokenizer now throws out pattern match regexes if their anchors are invalid for the current line.
• Tokenizer now won't mistakenly emit empty string tokens.
• Before the first pattern's regex to match the line would be processed into tokens. This apparently is incorrect. Instead, the pattern regex that has an offset that is closest to the offset wins. Changes reflect this.
• When parsing JSON grammars match regexes now only escape unescaped
• When parsing JSON grammars match regexes now truncate unicode
character codes larger than 0x10ffff to 0x10ffff, the largest possible
• Content names should only be applied to what is between begin/end
patterns. Might need to fix to not apply to end patterns themselves.