• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
      • Legacy-defrstobj
      • Proof-checker-array
      • Soft
      • C
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Leftist-trees
      • Java
      • Riscv
      • Taspi
      • Bitcoin
      • Des
      • Ethereum
      • X86isa
      • Sha-2
      • Yul
      • Zcash
      • Proof-checker-itp13
      • Regex
      • ACL2-programming-language
      • Json
      • Jfkr
      • Equational
      • Cryptography
      • Poseidon
      • Where-do-i-place-my-book
      • Axe
      • Aleo
        • Aleobft
        • Aleovm
        • Leo
          • Grammar
          • Early-version
            • Json2ast
            • Testing
            • Definition
              • Flattening
              • Abstract-syntax
              • Dynamic-semantics
              • Compilation
              • Static-semantics
              • Concrete-syntax
                • Pretty-printer
                • Grammar
                • Lexing-and-parsing
                  • Lexer
                    • Lex-symbol
                    • Lex-rest-of-block-comment-+-lex-rest-of-block-comment-after-star
                    • Lex-not-double-quote-or-backslash-or-line-feed-or-carriage-return
                    • Lex-not-star-or-slash-or-line-feed-or-carriage-return
                    • Lex-identifier/keyword/boolean/address
                    • Lex-simple-character-escape
                    • Lex-hexadecimal-digit
                    • Lex-token
                    • Lex-not-star-or-line-feed-or-carriage-return
                    • Lex-u8/16/32/64/128
                    • Lex-not-line-feed-or-carriage-return
                    • Lex-i8/16/32/64/128
                    • Lex-string-literal-element
                    • Lex-safe-ascii
                    • Lex-numeric-literal
                    • Lex-safe-nonascii
                    • Lex-letter/decdigit/underscore
                    • Lex-line-terminator
                    • Lex-unicode-character-escape
                    • Lex-ascii-character-escape
                    • Lex-1*6-hexadecimal-digit
                    • Lex-whitespace
                    • Lex-unsigned-literal
                    • Lex-lcletter/decdigit
                    • Lex-string-literal
                    • Lex-signed-literal
                    • Lex-product-group-literal
                    • Lex-lexeme
                    • Lex-single-quote-escape
                    • Lex-integer-literal
                    • Lex-double-quote-escape
                    • Lex-address-literal
                    • Lex-scalar-literal
                    • Lex-null-character-escape
                    • Lex-identifier
                    • Lex-horizontal-tab-escape
                    • Lex-carriage-return-escape
                    • Lex-line-comment
                    • Lex-letter
                    • Lex-field-literal
                    • Lex-comment
                    • Lex-block-comment
                    • Lex-uppercase-letter
                    • Lex-lowercase-letter
                    • Lex-line-feed-escape
                    • Lex-horizontal-tab
                    • Lex-carriage-return
                    • Lex-backslash-escape
                    • Lex-annotation
                    • Lex-visible-ascii
                    • Lex-single-quote
                    • Lex-octal-digit
                    • Lex-double-quote
                    • Lex-decimal-digit
                    • Lex-*-not-line-feed-or-carriage-return
                    • Lex-line-feed
                    • Lex-space
                    • Lex-numeral
                    • Lex-*-letter/decdigit/underscore
                    • Lex-*-string-literal-element
                    • Lex-58-lcletter/decdigit
                    • Lexemize-leo-from-string
                    • Lex-*-lcletter/decdigit
                    • Lex-*-hexadecimal-digit
                    • Lex-1*-decimal-digit
                    • Lex-*-decimal-digit
                    • Lex-*-lexeme
                    • Lexemize-leo-from-bytes
                    • Lexemize-leo
                      • *defparse-leo-repetition-table*
                      • *defparse-leo-group-table*
                      • Lex-generation-macros
                      • Lex-generation-tables
                      • *defparse-leo-option-table*
                    • Parser
                    • Token-fringe
                    • Longest-lexp
                    • Parser-interface
                    • Grammar-lexp
                    • Identifier-lexp
                    • Output-file-parsep
                    • Input-file-parsep
                    • File-lex-parse-p
                    • Filter-tokens
                    • Lexp
                    • File-parsep
                    • Input-parser
                    • Output-file-lex-parse-p
                    • Input-file-lex-parse-p
                    • Parser-abstractor-interface
                    • Identifier-abnf-stringp
                    • Symbol-abnf-stringp
                    • Keyword-abnf-stringp
                    • Output-parser
                    • Tokenizer
                  • Input-pretty-printer
                  • Output-pretty-printer
                  • Unicode-characters
                  • Concrete-syntax-trees
                  • Symbols
                  • Keywords
        • Bigmems
        • Builtins
        • Execloader
        • Solidity
        • Paco
        • Concurrent-programs
        • Bls12-377-curves
      • Debugging
      • Community
      • Std
      • Proof-automation
      • Macro-libraries
      • ACL2
      • Interfacing-tools
      • Hardware-verification
      • Software-verification
      • Math
      • Testing-utilities
    • Lexer

    Lexemize-leo

    Lexes the Unicode code points leo-codepoints into a list of lexemes.

    Signature
    (lexemize-leo leo-codepoints) → leo-lexemes
    Arguments
    leo-codepoints — Guard (nat-listp leo-codepoints).
    Returns
    leo-lexemes — Type (abnf::tree-list-resultp leo-lexemes).

    A lexeme is a token, comment, or whitespace. lexemize-leo returns two values: an error flag and a list of these lexemes in abnf::tree form. Lexemes are further separated into keyword, literal, identifier, or symbol. Recombining these lexemes is done in the parser.

    If the input cannot be fully lexed, a reserrp is returned.

    Definitions and Theorems

    Function: lexemize-leo

    (defun lexemize-leo (leo-codepoints)
     (declare (xargs :guard (nat-listp leo-codepoints)))
     (let ((__function__ 'lexemize-leo))
      (declare (ignorable __function__))
      (b* (((mv trees rest-input)
            (lex-*-lexeme leo-codepoints))
           ((when (reserrp trees))
            (reserrf (cons :unexpected-reserrp trees)))
           ((unless (null rest-input))
            (reserrf (cons :cannot-fully-lex (cons trees rest-input)))))
        trees)))

    Theorem: tree-list-resultp-of-lexemize-leo

    (defthm tree-list-resultp-of-lexemize-leo
      (b* ((leo-lexemes (lexemize-leo leo-codepoints)))
        (abnf::tree-list-resultp leo-lexemes))
      :rule-classes :rewrite)