Lexer
An executable lexer of Leo.
This is a simple lexer for the Leo lexical grammar.
Efficiency is not the primary focus for now;
correctness and simplicity are.
In the future, we may optimize this lexer,
or we may use APT to do so.
To support context-dependent lexing,
this lexer does not provide any code to lex a file prior to parsing;
rather, it provides lexing code that the parser calls as needed.
The lexer consists of a collection of lexing functions,
each of which takes a list of natural numbers as input,
which represents the Unicode characters that remain to lex
in the Leo file being lexed.
Each function returns two results:
the first result is either an error
or an ABNF tree (or list of trees) for the recognized lexeme(s);
the second result is
the remaining list of natural numbers after the lexeme.
While from a conceptual point of view
it would be better for all these lexing functions to return a single result
that is either an error
or a pair consisting of an ABNF tree or tree list plus remaining inputs,
by returning two results we make the execution more efficient
by avoiding constructing and deconstructing the pair.
Some of the code of this lexer is generated
via the parser generation tools in the ABNF library
(where `parser' in that context refers to the general idea of
recognizing and structuring strings in a formal language,
which also describes what Leo lexing does).
Other code is written by hand,
due to limitations in the aforementioned parser generation tools,
such as the efficiency of the generated code.
Subtopics
- Lex-symbol
- Lex a symbol.
- Lex-rest-of-block-comment-+-lex-rest-of-block-comment-after-star
- Lex a rest-of-block-comment
or a rest-of-block-comment-after-star.
- Lex-not-double-quote-or-backslash-or-line-feed-or-carriage-return
- Parse a not-double-quote-or-backslash-or-line-feed-or-carriage-return.
- Lex-not-star-or-slash-or-line-feed-or-carriage-return
- Parse a not-star-or-slash-or-line-feed-or-carriage-return.
- Lex-identifier/keyword/boolean/address
- Lex an identifier
or a keyword
or a boolean-literal
or an address-literal.
- Lex-simple-character-escape
- Parse a simple-character-escape.
- Lex-hexadecimal-digit
- Parse a hexadecimal-digit.
- Lex-token
- Lex a token.
- Lex-not-star-or-line-feed-or-carriage-return
- Parse a not-star-or-line-feed-or-carriage-return.
- Lex-u8/16/32/64/128
- Parse a ( %s"u8" / %s"u16" / %s"u32" / %s"u64" / %s"u128" ).
- Lex-not-line-feed-or-carriage-return
- Parse a not-line-feed-or-carriage-return.
- Lex-i8/16/32/64/128
- Parse a ( %s"i8" / %s"i16" / %s"i32" / %s"i64" / %s"i128" ).
- Lex-string-literal-element
- Parse a string-literal-element.
- Lex-safe-ascii
- Parse a safe-ascii.
- Lex-numeric-literal
- Lex an atomic-literal that is a numeric literal.
- Lex-safe-nonascii
- Parse a safe-nonascii.
- Lex-letter/decdigit/underscore
- Parse a ( letter / decimal-digit / "_" ).
- Lex-line-terminator
- Parse a line-terminator.
- Lex-unicode-character-escape
- Parse a unicode-character-escape.
- Lex-ascii-character-escape
- Parse a ascii-character-escape.
- Lex-1*6-hexadecimal-digit
- Lex a 1*6hexadecimal-digit.
- Lex-whitespace
- Parse a whitespace.
- Lex-unsigned-literal
- Parse a unsigned-literal.
- Lex-lcletter/decdigit
- Parse a ( lowercase-letter / decimal-digit ).
- Lex-string-literal
- Parse a string-literal.
- Lex-signed-literal
- Parse a signed-literal.
- Lex-product-group-literal
- Parse a product-group-literal.
- Lex-lexeme
- Parse a lexeme.
- Lex-single-quote-escape
- Parse a single-quote-escape.
- Lex-integer-literal
- Parse a integer-literal.
- Lex-double-quote-escape
- Parse a double-quote-escape.
- Lex-address-literal
- Parse a address-literal.
- Lex-scalar-literal
- Parse a scalar-literal.
- Lex-null-character-escape
- Parse a null-character-escape.
- Lex-identifier
- Parse a identifier.
- Lex-horizontal-tab-escape
- Parse a horizontal-tab-escape.
- Lex-carriage-return-escape
- Parse a carriage-return-escape.
- Lex-line-comment
- Parse a line-comment.
- Lex-letter
- Parse a letter.
- Lex-field-literal
- Parse a field-literal.
- Lex-comment
- Parse a comment.
- Lex-block-comment
- Parse a block-comment.
- Lex-uppercase-letter
- Parse a uppercase-letter.
- Lex-lowercase-letter
- Parse a lowercase-letter.
- Lex-line-feed-escape
- Parse a line-feed-escape.
- Lex-horizontal-tab
- Parse a horizontal-tab.
- Lex-carriage-return
- Parse a carriage-return.
- Lex-backslash-escape
- Parse a backslash-escape.
- Lex-annotation
- Parse a annotation.
- Lex-visible-ascii
- Parse a visible-ascii.
- Lex-single-quote
- Parse a single-quote.
- Lex-octal-digit
- Parse a octal-digit.
- Lex-double-quote
- Parse a double-quote.
- Lex-decimal-digit
- Parse a decimal-digit.
- Lex-*-not-line-feed-or-carriage-return
- Parse a *not-line-feed-or-carriage-return.
- Lex-line-feed
- Parse a line-feed.
- Lex-space
- Parse a space.
- Lex-numeral
- Parse a numeral.
- Lex-*-letter/decdigit/underscore
- Parse a *( letter / decimal-digit / "_" ).
- Lex-*-string-literal-element
- Parse a *string-literal-element.
- Lex-58-lcletter/decdigit
- Lex a 58( lowercase-letter / decimal-digit ).
- Lexemize-leo-from-string
- Lexes the UTF-8 leo-string into a list of lexemes.
- Lex-*-lcletter/decdigit
- Parse a *( lowercase-letter / decimal-digit ).
- Lex-*-hexadecimal-digit
- Parse a *hexadecimal-digit.
- Lex-1*-decimal-digit
- Lex a 1*decimal-digit.
- Lex-*-decimal-digit
- Parse a *decimal-digit.
- Lex-*-lexeme
- Parse a *lexeme.
- Lexemize-leo-from-bytes
- Lexes the UTF-8 bytes into a list of lexemes.
- Lexemize-leo
- Lexes the Unicode code points leo-codepoints into a list of lexemes.
- *defparse-leo-repetition-table*
- Table of repetition parsing functions.
- *defparse-leo-group-table*
- Table of group parsing functions.
- Lex-generation-macros
- Lexing generation macros specialized to this lexer.
- Lex-generation-tables
- ABNF group, option, and repetition tables
for generating lexing functions.
- *defparse-leo-option-table*
- Table of option parsing functions.