• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
        • Proof-support
        • Abstract-syntax
        • R1cs-subset
        • Semantics
        • Abstract-syntax-operations
        • Indexed-names
        • Well-formedness
        • Concrete-syntax
          • Lexer
            • Lex-separator
            • Lex-operator
            • Lex-group-letter/digit/_
            • Lex-token
            • Lex-whitespace
            • Lex-line-terminator
            • Lex-identifier
            • Lex-lexeme
            • Lex-letter
            • Lex-uppercase-letter
            • Lex-lowercase-letter
            • Lex-integer
            • Lex-carriage-return
            • Lex-line-feed
            • Lex-space
            • Lex-optional-hyphen
            • Lex-optional-cr
            • Lex-numeral
            • Lex-digit
            • Lex-*-rest-of-identifier
            • Lex-1*-digit
            • Lexemize-pfcs-from-string
            • Lex-*-lexeme
            • Lex-*-digit
            • Lexemize-pfcs-from-bytes
            • Lexemize-pfcs
            • *defparse-pfcs-repetition-table*
            • *defparse-pfcs-option-table*
            • Lex-generation-macros
            • *defparse-pfcs-group-table*
            • Lex-generation-tables
          • Grammar
          • Parser
          • Tokenizer
        • R1cs-bridge
        • Parser-interface
      • Legacy-defrstobj
      • Proof-checker-array
      • Soft
      • C
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Leftist-trees
      • Java
      • Taspi
      • Bitcoin
      • Riscv
      • Des
      • Ethereum
      • X86isa
      • Sha-2
      • Yul
      • Zcash
      • Proof-checker-itp13
      • Regex
      • ACL2-programming-language
      • Json
      • Jfkr
      • Equational
      • Cryptography
      • Poseidon
      • Where-do-i-place-my-book
      • Axe
      • Bigmems
      • Builtins
      • Execloader
      • Aleo
      • Solidity
      • Paco
      • Concurrent-programs
      • Bls12-377-curves
    • Debugging
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
    • Math
    • Testing-utilities
  • Concrete-syntax

Lexer

An executable lexer of PFCS.

This is a simple lexer for the PFCS lexical grammar. Efficiency is not the primary focus for now; correctness and simplicity are. In the future, we may optimize this lexer, or we may use APT to do so.

Note that the lexical grammar for PFCS consists of a subset of ASCII characters that is also a subset of Unicode characters that UTF-8 encodes in a single byte. However, some of this lexer code is written to support the possibility of future concrete syntax that is UTF-8 encoded as multiple bytes. Hence the references to "Unicode characters".

The lexer consists of a collection of lexing functions, each of which takes a list of natural numbers as input, which represents the Unicode codepoints of characters that remain to lex in the PFCS definition being lexed. Each function returns two results: the first result is either an error or an ABNF tree (or list of trees) for the recognized lexeme(s); the second result is the remaining list of natural numbers after the lexeme. While from a conceptual point of view it would be better for all these lexing functions to return a single result that is either an error or a pair consisting of an ABNF tree or tree list plus remaining inputs, by returning two results we make the execution more efficient by avoiding constructing and deconstructing the pair.

Some of the code of this lexer is generated via the parser generation tools in the ABNF library (where `parser' in that context refers to the general idea of recognizing and structuring strings in a formal language, which also describes what PFCS lexing does). Other code is written by hand, due to limitations in the aforementioned parser generation tools, such as the efficiency of the generated code.

Subtopics

Lex-separator
Parse a separator.
Lex-operator
Parse a operator.
Lex-group-letter/digit/_
Parse a ( letter / digit / "_" ).
Lex-token
Parse a token.
Lex-whitespace
Parse a whitespace.
Lex-line-terminator
Parse a line-terminator.
Lex-identifier
Parse a identifier.
Lex-lexeme
Parse a lexeme.
Lex-letter
Parse a letter.
Lex-uppercase-letter
Parse a uppercase-letter.
Lex-lowercase-letter
Parse a lowercase-letter.
Lex-integer
Parse a integer.
Lex-carriage-return
Parse a carriage-return.
Lex-line-feed
Parse a line-feed.
Lex-space
Parse a space.
Lex-optional-hyphen
Parse a [ "-" ].
Lex-optional-cr
Parse a [ carriage-return ].
Lex-numeral
Parse a numeral.
Lex-digit
Parse a digit.
Lex-*-rest-of-identifier
Parse a *( letter / digit / "_" ).
Lex-1*-digit
Lex 1*digit.
Lexemize-pfcs-from-string
Lexes the UTF-8 pfcs-string into a list of lexemes.
Lex-*-lexeme
Parse a *lexeme.
Lex-*-digit
Parse a *digit.
Lexemize-pfcs-from-bytes
Lexes the UTF-8 bytes into a list of lexemes.
Lexemize-pfcs
Lexes the Unicode codepoints pfcs-codepoints into lexemes.
*defparse-pfcs-repetition-table*
Table of repetition parsing functions.
*defparse-pfcs-option-table*
Table of option parsing functions.
Lex-generation-macros
Lexing generation macros specialized to this lexer.
*defparse-pfcs-group-table*
Table of group parsing functions.
Lex-generation-tables
ABNF group, option, and repetition tables for generating lexing functions.