TextMate-style syntax highlighting in PHP
Find a file
Dustin Wilson 40b205fdd2 Fixed a few bugs when testing sass grammar tokenization
• I -really- hate debugging this because there's no reference to go by to ensure things are correct except trial and error.
• Sometimes when resolving scope names the wrong match would end up in the name.
• Because of how references are handled in this implementation sometimes there'd be a leftover pattern containing a single reference when popping off the rule and scope stacks. It would cause havoc, so a bit of bullshit is needed to circumvent that. Probably can simplify it in the future because checking against the end pattern like it is probably isn't necessary, but this works at present.
2022-02-08 00:38:42 -06:00
data Cleaning up JSON files 2021-09-16 13:46:47 -05:00
lib Fixed a few bugs when testing sass grammar tokenization 2022-02-08 00:38:42 -06:00
.gitignore More minor tweaks/fixes 2021-08-17 22:05:45 -05:00
AUTHORS 0.0.1 2022-01-11 09:01:37 -06:00
composer.json Integration with Framework 2022-01-05 10:17:42 -06:00
composer.lock Integration with Framework 2022-01-05 10:17:42 -06:00
LICENSE Initial commit 2021-05-31 23:04:08 -05:00
package.json Beginning work on language grammars 2021-06-16 16:44:47 -05:00
README.md Added GrammarRegistry::has 2022-01-11 14:11:21 -06:00
run Cleaning up JSON files 2021-09-16 13:46:47 -05:00
yarn.lock Beginning work on language grammars 2021-06-16 16:44:47 -05:00

Lit

Lit is a multilanguage syntax highlighter written in PHP. It takes code as input and returns an HTML pre element containing the code highlighted using span elements with classes based upon tokens in the code. It is loosely based upon Atom's Highlights which is used in the Atom text editor to syntax highlight code. Atom's Highlights is in turn based upon TextMate's syntax highlighting using its concepts of scope selectors and common keywords for components of programming languages. Lit is not a port of Atom's Highlights but instead an independent implementation of what I can understand of TextMate's grammar syntax, parsing, and tokenization by analyzing other implementations. It aims to at least have feature parity or better with Atom's Highlights.

Warning Before Using

This library is experimental. The code is not tested at all, and writing tests for it will be an incredible undertaking because there's no specification whatsoever to test against. It would require creating a specification as well which is beyond the scope of a project that exists just to scratch an itch. Atom's Highlights is also barely tested itself for the same reason. There's numerous PHP libraries out there without a test suite; not having one is not up to our usual standards, though. So, that's why this warning exists.

Documentation

MensBeam\Lit\Grammar::__construct

Creates a new MensBeam\Lit\Grammar object.

public function MensBeam\Lit\Grammar::__construct(?string $scopeName = null, ?array $patterns = null, ?string $name = null, ?array $injections = null, ?array $repository = null)

Parameters

In normal usage of the library the parameters won't be used (see MensBeam\Lit\Grammar::loadJSON and examples below for more information), but they are listed below for completeness' sake.

scopeName - The scope name of the grammar
patterns - The list of patterns in the grammar
name - A human-readable name for the grammar
injections - The list of injections in the grammar
repository - The list of repository items in the grammar

MensBeam\Lit\Grammar::loadJSON

Imports an Atom JSON grammar into the MensBeam\Lit\Grammar object.

public function MensBeam\Lit\Grammar::loadJSON(string $filename)

Parameters

filename - The JSON file to be imported

MensBeam\Lit\GrammarRegistry::clear

Clears all grammars from the registry

public static function MensBeam\Lit\GrammarRegistry::clear()

MensBeam\Lit\GrammarRegistry::get

Retrieves a grammar from the registry

public static function MensBeam\Lit\GrammarRegistry::get(string $scopeName): Grammar|false

Parameters

scopeName - The scope name (eg: text.html.php) of the grammar that is being requested

Return Values

Returns a MensBeam\Lit\Grammar object on success and false on failure.

MensBeam\Lit\GrammarRegistry::set

Retrieves a grammar from the registry

public static function MensBeam\Lit\GrammarRegistry::set(string $scopeName, MensBeam\Lit\Grammar $grammar): bool

Parameters

scopeName - The scope name (eg: text.html.php) of the grammar that is being set
grammar - The grammar to be put into the registry

Return Values

Returns true on success and false on failure.

MensBeam\Lit\Highlight::toElement

Highlights incoming string data and outputs a PHP DOMElement.

public static MensBeam\Lit\Highlight::toElement(string $data, string $scopeName, ?\DOMDocument $document = null, string $encoding = 'windows-1252'): \DOMElement

Parameters

data - The input data string
scopeName - The scope name (eg: text.html.php) of the grammar that's needed to highlight the input data
document - An existing DOMDocument to use as the owner document of the returned DOMElement; if omitted one will be created instead
encoding - The encoding of the input data string; only used if a document wasn't provided in the previous parameter, otherwise it uses the encoding of the existing DOMDocument; defaults to HTML standard default windows-1252

Return Values

Returns a pre DOMElement.

MensBeam\Lit\Highlight::toString

Highlights incoming string data and outputs a string containing serialized HTML.

public static MensBeam\Lit\Highlight::toString(string $data, string $scopeName, string $encoding = 'windows-1252'): string

Parameters

data - The input data string
scopeName - The scope name (eg: text.html.php) of the grammar that's needed to highlight the input data
encoding - The encoding of the input data string; defaults to HTML standard default windows-1252

Return Values

Returns a string.

Examples

Here's an example of highlighting PHP code:

$code = <<<CODE
<?php
echo "🐵 OOK! 🐵";
?>
CODE;

// Use UTF-8 as the encoding to preserve the emojis.
$element = MensBeam\Lit\Highlight::toElement($code, 'text.html.php', null, 'UTF-8');
$element->setAttribute('class', 'highlighted');

// Use PHP DOM's DOMDocument::saveHTML method to print the highlighted markup
// when finished with manipulating it.
echo $element->ownerDocument->saveHTML($element);

This will produce:

<pre class="highlighted"><code class="text html php"><span class="meta embedded block php"><span class="punctuation section embedded begin php">&lt;?php</span><span class="source php">
<span class="support function construct output php">echo</span> <span class="string quoted double php"><span class="punctuation definition string begin php">"</span>🐵 OOK! 🐵<span class="punctuation definition string end php">"</span></span><span class="punctuation terminator expression php">;</span>
</span><span class="punctuation section embedded end php"><span class="source php">?</span>&gt;</span></span></code></pre>

An already existing DOMDocument may be used as the owner document of the returned pre element:

...
$document = new DOMDocument();
// $element will be owned by $document.
$element = MensBeam\Lit\Highlight::toElement($code, 'text.html.php', $document);

Other DOM libraries which inherit from and/or encapsulate PHP's DOM such as MensBeam\HTML-DOM may also be used:

...
$document = new MensBeam\HTML\DOM\Document();
// $element will be owned by $document.
$element = MensBeam\Lit\Highlight::toElement($code, 'text.html.php', $document->innerNode);
$element = $element->ownerDocument->getWrapperNode($element);
// MensBeam\HTML\DOM\Element can simply be cast to a string to serialize.
$string = (string)$element;

Of course Lit can simply output a string, too:

...
$string = MensBeam\Lit\Highlight::toString($code, 'text.html.php');

Lit has quite a long list of out-of-the-bag supported languages, but sometimes other languages need to be highlighted:

...
// Import a hypothetical Ook Atom JSON language grammar into a Grammar object
// and add it to the registry.
$grammar = new MensBeam\Lit\Grammar;
$grammar->loadJSON('/path/to/source.ook.json');
MensBeam\Lit\GrammarRegistry::set($grammar->scopeName, $grammar);

// Now the grammar can be used to highlight code
$element = MensBeam\Lit\Highlight::toElement($code, $grammar->scopeName);

Supported Languages & Formats

  • AppleScript
  • C
  • C#
  • C# Cake file
  • C# Script file
  • C++
  • CoffeeScript
  • CSS
  • Diff
  • Github Flavored Markdown
  • Git config
  • Go
  • Go modules
  • Go templates
  • HTML
  • Java
  • Java expression language
  • Java properties
  • JavaScript
  • JavaScript Regular Expressions
  • JSDoc
  • JSON
  • Less
  • Lua
  • Makefile
  • Markdown (CommonMark)
  • Objective C
  • Perl
  • Perl 6
  • PHP
  • Plist
  • Plist (XML, old-style)
  • Python
  • Python console
  • Python Regular Expressions
  • Python traceback
  • Ruby
  • Ruby gemfile
  • Ruby on Rails (RJS)
  • Rust
  • Sass
  • SassDoc
  • SCSS
  • Shell (Bash)
  • Shell session (Bash)
  • SQL
  • SQL (Mustache)
  • Textile
  • Todo
  • XML
  • XSL