• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
      • Legacy-defrstobj
      • Proof-checker-array
      • Soft
      • C
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Leftist-trees
      • Java
      • Riscv
      • Taspi
      • Bitcoin
      • Des
      • Ethereum
      • X86isa
      • Sha-2
      • Yul
      • Zcash
      • Proof-checker-itp13
      • Regex
      • ACL2-programming-language
      • Json
      • Jfkr
      • Equational
      • Cryptography
      • Poseidon
      • Where-do-i-place-my-book
      • Axe
      • Aleo
        • Aleobft
        • Aleovm
        • Leo
          • Grammar
          • Early-version
            • Json2ast
            • Testing
            • Definition
              • Flattening
              • Abstract-syntax
              • Dynamic-semantics
              • Compilation
              • Static-semantics
              • Concrete-syntax
                • Pretty-printer
                • Grammar
                • Lexing-and-parsing
                  • Lexer
                    • Lex-symbol
                    • Lex-rest-of-block-comment-+-lex-rest-of-block-comment-after-star
                    • Lex-not-double-quote-or-backslash-or-line-feed-or-carriage-return
                    • Lex-not-star-or-slash-or-line-feed-or-carriage-return
                    • Lex-identifier/keyword/boolean/address
                    • Lex-simple-character-escape
                    • Lex-hexadecimal-digit
                    • Lex-token
                    • Lex-not-star-or-line-feed-or-carriage-return
                    • Lex-u8/16/32/64/128
                    • Lex-not-line-feed-or-carriage-return
                    • Lex-i8/16/32/64/128
                    • Lex-string-literal-element
                    • Lex-safe-ascii
                    • Lex-numeric-literal
                    • Lex-safe-nonascii
                    • Lex-letter/decdigit/underscore
                    • Lex-line-terminator
                    • Lex-unicode-character-escape
                    • Lex-ascii-character-escape
                    • Lex-1*6-hexadecimal-digit
                    • Lex-whitespace
                    • Lex-unsigned-literal
                    • Lex-lcletter/decdigit
                    • Lex-string-literal
                    • Lex-signed-literal
                    • Lex-product-group-literal
                    • Lex-lexeme
                    • Lex-single-quote-escape
                    • Lex-integer-literal
                    • Lex-double-quote-escape
                    • Lex-address-literal
                    • Lex-scalar-literal
                    • Lex-null-character-escape
                    • Lex-identifier
                    • Lex-horizontal-tab-escape
                    • Lex-carriage-return-escape
                    • Lex-line-comment
                    • Lex-letter
                    • Lex-field-literal
                    • Lex-comment
                    • Lex-block-comment
                    • Lex-uppercase-letter
                    • Lex-lowercase-letter
                    • Lex-line-feed-escape
                    • Lex-horizontal-tab
                    • Lex-carriage-return
                    • Lex-backslash-escape
                    • Lex-annotation
                    • Lex-visible-ascii
                    • Lex-single-quote
                    • Lex-octal-digit
                    • Lex-double-quote
                    • Lex-decimal-digit
                    • Lex-*-not-line-feed-or-carriage-return
                    • Lex-line-feed
                    • Lex-space
                    • Lex-numeral
                    • Lex-*-letter/decdigit/underscore
                    • Lex-*-string-literal-element
                    • Lex-58-lcletter/decdigit
                    • Lexemize-leo-from-string
                    • Lex-*-lcletter/decdigit
                    • Lex-*-hexadecimal-digit
                    • Lex-1*-decimal-digit
                    • Lex-*-decimal-digit
                    • Lex-*-lexeme
                    • Lexemize-leo-from-bytes
                    • Lexemize-leo
                    • *defparse-leo-repetition-table*
                    • *defparse-leo-group-table*
                    • Lex-generation-macros
                    • Lex-generation-tables
                    • *defparse-leo-option-table*
                  • Parser
                  • Token-fringe
                  • Longest-lexp
                  • Parser-interface
                  • Grammar-lexp
                  • Identifier-lexp
                  • Output-file-parsep
                  • Input-file-parsep
                  • File-lex-parse-p
                  • Filter-tokens
                  • Lexp
                  • File-parsep
                  • Input-parser
                  • Output-file-lex-parse-p
                  • Input-file-lex-parse-p
                  • Parser-abstractor-interface
                  • Identifier-abnf-stringp
                  • Symbol-abnf-stringp
                  • Keyword-abnf-stringp
                  • Output-parser
                  • Tokenizer
                • Input-pretty-printer
                • Output-pretty-printer
                • Unicode-characters
                • Concrete-syntax-trees
                • Symbols
                • Keywords
      • Bigmems
      • Builtins
      • Execloader
      • Solidity
      • Paco
      • Concurrent-programs
      • Bls12-377-curves
    • Debugging
    • Community
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
    • Math
    • Testing-utilities
  • Lexing-and-parsing

Lexer

An executable lexer of Leo.

This is a simple lexer for the Leo lexical grammar. Efficiency is not the primary focus for now; correctness and simplicity are. In the future, we may optimize this lexer, or we may use APT to do so.

To support context-dependent lexing, this lexer does not provide any code to lex a file prior to parsing; rather, it provides lexing code that the parser calls as needed.

The lexer consists of a collection of lexing functions, each of which takes a list of natural numbers as input, which represents the Unicode characters that remain to lex in the Leo file being lexed. Each function returns two results: the first result is either an error or an ABNF tree (or list of trees) for the recognized lexeme(s); the second result is the remaining list of natural numbers after the lexeme. While from a conceptual point of view it would be better for all these lexing functions to return a single result that is either an error or a pair consisting of an ABNF tree or tree list plus remaining inputs, by returning two results we make the execution more efficient by avoiding constructing and deconstructing the pair.

Some of the code of this lexer is generated via the parser generation tools in the ABNF library (where `parser' in that context refers to the general idea of recognizing and structuring strings in a formal language, which also describes what Leo lexing does). Other code is written by hand, due to limitations in the aforementioned parser generation tools, such as the efficiency of the generated code.

Subtopics

Lex-symbol
Lex a symbol.
Lex-rest-of-block-comment-+-lex-rest-of-block-comment-after-star
Lex a rest-of-block-comment or a rest-of-block-comment-after-star.
Lex-not-double-quote-or-backslash-or-line-feed-or-carriage-return
Parse a not-double-quote-or-backslash-or-line-feed-or-carriage-return.
Lex-not-star-or-slash-or-line-feed-or-carriage-return
Parse a not-star-or-slash-or-line-feed-or-carriage-return.
Lex-identifier/keyword/boolean/address
Lex an identifier or a keyword or a boolean-literal or an address-literal.
Lex-simple-character-escape
Parse a simple-character-escape.
Lex-hexadecimal-digit
Parse a hexadecimal-digit.
Lex-token
Lex a token.
Lex-not-star-or-line-feed-or-carriage-return
Parse a not-star-or-line-feed-or-carriage-return.
Lex-u8/16/32/64/128
Parse a ( %s"u8" / %s"u16" / %s"u32" / %s"u64" / %s"u128" ).
Lex-not-line-feed-or-carriage-return
Parse a not-line-feed-or-carriage-return.
Lex-i8/16/32/64/128
Parse a ( %s"i8" / %s"i16" / %s"i32" / %s"i64" / %s"i128" ).
Lex-string-literal-element
Parse a string-literal-element.
Lex-safe-ascii
Parse a safe-ascii.
Lex-numeric-literal
Lex an atomic-literal that is a numeric literal.
Lex-safe-nonascii
Parse a safe-nonascii.
Lex-letter/decdigit/underscore
Parse a ( letter / decimal-digit / "_" ).
Lex-line-terminator
Parse a line-terminator.
Lex-unicode-character-escape
Parse a unicode-character-escape.
Lex-ascii-character-escape
Parse a ascii-character-escape.
Lex-1*6-hexadecimal-digit
Lex a 1*6hexadecimal-digit.
Lex-whitespace
Parse a whitespace.
Lex-unsigned-literal
Parse a unsigned-literal.
Lex-lcletter/decdigit
Parse a ( lowercase-letter / decimal-digit ).
Lex-string-literal
Parse a string-literal.
Lex-signed-literal
Parse a signed-literal.
Lex-product-group-literal
Parse a product-group-literal.
Lex-lexeme
Parse a lexeme.
Lex-single-quote-escape
Parse a single-quote-escape.
Lex-integer-literal
Parse a integer-literal.
Lex-double-quote-escape
Parse a double-quote-escape.
Lex-address-literal
Parse a address-literal.
Lex-scalar-literal
Parse a scalar-literal.
Lex-null-character-escape
Parse a null-character-escape.
Lex-identifier
Parse a identifier.
Lex-horizontal-tab-escape
Parse a horizontal-tab-escape.
Lex-carriage-return-escape
Parse a carriage-return-escape.
Lex-line-comment
Parse a line-comment.
Lex-letter
Parse a letter.
Lex-field-literal
Parse a field-literal.
Lex-comment
Parse a comment.
Lex-block-comment
Parse a block-comment.
Lex-uppercase-letter
Parse a uppercase-letter.
Lex-lowercase-letter
Parse a lowercase-letter.
Lex-line-feed-escape
Parse a line-feed-escape.
Lex-horizontal-tab
Parse a horizontal-tab.
Lex-carriage-return
Parse a carriage-return.
Lex-backslash-escape
Parse a backslash-escape.
Lex-annotation
Parse a annotation.
Lex-visible-ascii
Parse a visible-ascii.
Lex-single-quote
Parse a single-quote.
Lex-octal-digit
Parse a octal-digit.
Lex-double-quote
Parse a double-quote.
Lex-decimal-digit
Parse a decimal-digit.
Lex-*-not-line-feed-or-carriage-return
Parse a *not-line-feed-or-carriage-return.
Lex-line-feed
Parse a line-feed.
Lex-space
Parse a space.
Lex-numeral
Parse a numeral.
Lex-*-letter/decdigit/underscore
Parse a *( letter / decimal-digit / "_" ).
Lex-*-string-literal-element
Parse a *string-literal-element.
Lex-58-lcletter/decdigit
Lex a 58( lowercase-letter / decimal-digit ).
Lexemize-leo-from-string
Lexes the UTF-8 leo-string into a list of lexemes.
Lex-*-lcletter/decdigit
Parse a *( lowercase-letter / decimal-digit ).
Lex-*-hexadecimal-digit
Parse a *hexadecimal-digit.
Lex-1*-decimal-digit
Lex a 1*decimal-digit.
Lex-*-decimal-digit
Parse a *decimal-digit.
Lex-*-lexeme
Parse a *lexeme.
Lexemize-leo-from-bytes
Lexes the UTF-8 bytes into a list of lexemes.
Lexemize-leo
Lexes the Unicode code points leo-codepoints into a list of lexemes.
*defparse-leo-repetition-table*
Table of repetition parsing functions.
*defparse-leo-group-table*
Table of group parsing functions.
Lex-generation-macros
Lexing generation macros specialized to this lexer.
Lex-generation-tables
ABNF group, option, and repetition tables for generating lexing functions.
*defparse-leo-option-table*
Table of option parsing functions.