• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
      • Legacy-defrstobj
      • Proof-checker-array
      • Soft
      • C
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Leftist-trees
      • Java
      • Riscv
      • Taspi
      • Bitcoin
      • Des
      • Ethereum
      • X86isa
      • Sha-2
      • Yul
      • Zcash
      • Proof-checker-itp13
      • Regex
      • ACL2-programming-language
      • Json
      • Jfkr
      • Equational
      • Cryptography
      • Poseidon
      • Where-do-i-place-my-book
      • Axe
      • Aleo
        • Aleobft
        • Aleovm
        • Leo
          • Grammar
          • Early-version
            • Json2ast
            • Testing
            • Definition
              • Flattening
              • Abstract-syntax
              • Dynamic-semantics
              • Compilation
              • Static-semantics
              • Concrete-syntax
                • Pretty-printer
                • Grammar
                • Lexing-and-parsing
                  • Lexer
                  • Parser
                  • Token-fringe
                  • Longest-lexp
                    • Parser-interface
                    • Grammar-lexp
                    • Identifier-lexp
                    • Output-file-parsep
                    • Input-file-parsep
                    • File-lex-parse-p
                    • Filter-tokens
                    • Lexp
                    • File-parsep
                    • Input-parser
                    • Output-file-lex-parse-p
                    • Input-file-lex-parse-p
                    • Parser-abstractor-interface
                    • Identifier-abnf-stringp
                    • Symbol-abnf-stringp
                    • Keyword-abnf-stringp
                    • Output-parser
                    • Tokenizer
                  • Input-pretty-printer
                  • Output-pretty-printer
                  • Unicode-characters
                  • Concrete-syntax-trees
                  • Symbols
                  • Keywords
        • Bigmems
        • Builtins
        • Execloader
        • Solidity
        • Paco
        • Concurrent-programs
        • Bls12-377-curves
      • Debugging
      • Community
      • Std
      • Proof-automation
      • Macro-libraries
      • ACL2
      • Interfacing-tools
      • Hardware-verification
      • Software-verification
      • Math
      • Testing-utilities
    • Lexing-and-parsing

    Longest-lexp

    Longest lexeme rule.

    Signature
    (longest-lexp trees) → yes/no
    Arguments
    trees — Guard (abnf::tree-listp trees).
    Returns
    yes/no — Type (booleanp yes/no).

    Some ambiguities in the lexical grammar arise because some lexemes are concatenations of others. In the example in grammar-lexp, ** is the concatenation of * and *.

    These ambiguities are eliminated by imposing the extra-grammatical requirement that lexing always take the longest possible sequence of characters that may constitute a lexeme, regardless of what follows. In particular, this requirement applies even when taking the longest sequence of characters causes subsequent lexing and parsing to fail, whereas taking a shorter sequence would let subsequent lexing and parsing succeed. For example, functionf is lexed as the single identifier functionf even though lexing it as the keyword function followed by the identifier f may cause the rest of the code to be successfully lexed and parsed.

    This longest lexeme rule is formalized by this predicate, which applies to the CST sequence trees that grammar-lexp relates to Unicode character sequence. This predicate says that, for each CST tree in trees except the last one, there must exist no other CST tree1 for a lexeme whose fringe (i) properly extends the fringe of tree (i.e. the latter is a prefix of the former, and they are not equal) and (ii) is a prefix of the fringe of tree and subsequent CSTs. In other words, after lexing the fringe of the CSTs that precede tree in trees, tree is the CST for the longest possible lexeme starting at the fringe of tree and subsequent CSTs: there is no other possible CST at that point for a longer lexeme.

    Given a Unicode character sequence ucodes, if we restrict the CST sequence trees for which grammar-lexp holds to additionally satisfy longest-lexp, lexing is still ambiguous. There is some overlap between the grammatical definitions of keywords, boolean literals, address literals, and identifiers. Even though the rule for identifier include an ABNF comment indicating the exclusion of keywords and boolean literals and anything equal to or starting with aleo1, the actual ABNF definitions allow that: grammatically, a keyword is also an identifier, a boolean literal is also an identifier, and an identifier may be or start with aleo1.

    Definitions and Theorems

    Theorem: exist-longer-lexeme-p-suff

    (defthm exist-longer-lexeme-p-suff
      (implies (and (abnf-tree-with-root-p tree1 "lexeme")
                    (prefixp (abnf::tree->string tree)
                             (abnf::tree->string tree1))
                    (not (equal (abnf::tree->string tree)
                                (abnf::tree->string tree1)))
                    (prefixp (abnf::tree->string tree1)
                             (abnf::tree-list->string trees)))
               (exist-longer-lexeme-p tree trees)))

    Theorem: booleanp-of-exist-longer-lexeme-p

    (defthm booleanp-of-exist-longer-lexeme-p
      (b* ((yes/no (exist-longer-lexeme-p tree trees)))
        (booleanp yes/no))
      :rule-classes :rewrite)

    Theorem: exist-longer-lexeme-p-of-tree-fix-tree

    (defthm exist-longer-lexeme-p-of-tree-fix-tree
      (equal (exist-longer-lexeme-p (abnf::tree-fix tree)
                                    trees)
             (exist-longer-lexeme-p tree trees)))

    Theorem: exist-longer-lexeme-p-tree-equiv-congruence-on-tree

    (defthm exist-longer-lexeme-p-tree-equiv-congruence-on-tree
      (implies (abnf::tree-equiv tree tree-equiv)
               (equal (exist-longer-lexeme-p tree trees)
                      (exist-longer-lexeme-p tree-equiv trees)))
      :rule-classes :congruence)

    Theorem: exist-longer-lexeme-p-of-tree-list-fix-trees

    (defthm exist-longer-lexeme-p-of-tree-list-fix-trees
      (equal (exist-longer-lexeme-p tree (abnf::tree-list-fix trees))
             (exist-longer-lexeme-p tree trees)))

    Theorem: exist-longer-lexeme-p-tree-list-equiv-congruence-on-trees

    (defthm exist-longer-lexeme-p-tree-list-equiv-congruence-on-trees
      (implies (abnf::tree-list-equiv trees trees-equiv)
               (equal (exist-longer-lexeme-p tree trees)
                      (exist-longer-lexeme-p tree trees-equiv)))
      :rule-classes :congruence)

    Function: longest-lexp

    (defun longest-lexp (trees)
      (declare (xargs :guard (abnf::tree-listp trees)))
      (let ((__function__ 'longest-lexp))
        (declare (ignorable __function__))
        (or (endp trees)
            (endp (cdr trees))
            (and (not (exist-longer-lexeme-p (car trees)
                                             trees))
                 (longest-lexp (cdr trees))))))

    Theorem: booleanp-of-longest-lexp

    (defthm booleanp-of-longest-lexp
      (b* ((yes/no (longest-lexp trees)))
        (booleanp yes/no))
      :rule-classes :rewrite)

    Theorem: longest-lexp-of-tree-list-fix-trees

    (defthm longest-lexp-of-tree-list-fix-trees
      (equal (longest-lexp (abnf::tree-list-fix trees))
             (longest-lexp trees)))

    Theorem: longest-lexp-tree-list-equiv-congruence-on-trees

    (defthm longest-lexp-tree-list-equiv-congruence-on-trees
      (implies (abnf::tree-list-equiv trees trees-equiv)
               (equal (longest-lexp trees)
                      (longest-lexp trees-equiv)))
      :rule-classes :congruence)