• Fixed bug where nonexistent grammars would cause tokenizer to fail.
• Added mensbeam/html as a dependency, removed docopt/docopt and
ext-mbstring.
• Discovered bug when injections are removed from the stack when
tokenizing, investigating.
• When calculating the offset after handling overlapping tokens it now aware of invalid capture offsets (meaning they matched nothing).
• Tokenizer::tokenizeLine now correctly does not continue looking for new matches when the newly tokenized pattern was an end pattern.
• Grammars no longer have beginCaptures incorrectly applied to end patterns.
• Added pattern match anchor support.
• Data is now an instanced class with support only for string input.
• Data now has firstLine, lastLine, and lastLineBeforeFinalNewLine properties to facilitate anchoring
• Highlight now has a static toDOM method for highlighting to a DOM tree instead of the withFile and withString methods for accepting different kinds of input
• Tokenizer now only outputs newline tokens if not the last line
• Tokenizer now throws out pattern match regexes if their anchors are invalid for the current line.
• Tokenizer now won't mistakenly emit empty string tokens.
• Before the first pattern's regex to match the line would be processed into tokens. This apparently is incorrect. Instead, the pattern regex that has an offset that is closest to the offset wins. Changes reflect this.
• Added a flag for begin patterns
• Trying to handle begin/end patterns better. Begin patterns shouldn't automatically remove themselves from the stack, their corresponding end pattern should instead.
• Lines are now converted to UTF-32 while tokenizing so that byte
offsets may be cleanly converted to character offsets
• Now when grammars are parsed into Grammar objects begin and end
matches are converted to regular matches by adding end matches to the
pattern's pattern list to simplify tokenization.
• Highlight::withFile and Highlight::withString now accept an encoding
parameter which defaults to UTF-8.