• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
      • Kestrel-books
        • Crypto-hdwallet
        • Apt
        • Error-checking
        • Fty-extensions
        • Isar
        • Kestrel-utilities
        • Set
        • Soft
        • C
          • Syntax-for-tools
          • Atc
            • Atc-implementation
              • Atc-abstract-syntax
                • Irr-tyspecseq
                • Irr-tag-declon
                • Irr-param-declon
                • Irr-iconst-length
                • Irr-iconst
                • Irr-fundef
                • Irr-ext-declon
                • Irr-unop
                • Irr-tyname
                • Irr-transunit
                • Irr-stmt
                • Irr-initer
                • Irr-ident
                • Irr-fileset
                • Irr-expr
                • Irr-block-item
                • Irr-binop
                • Irr-file
              • Atc-pretty-printer
              • Atc-event-and-code-generation
              • Fty-pseudo-term-utilities
              • Atc-term-recognizers
              • Atc-input-processing
              • Atc-shallow-embedding
              • Atc-process-inputs-and-gen-everything
              • Atc-table
              • Atc-fn
              • Atc-pretty-printing-options
              • Atc-types
              • Atc-macro-definition
            • Atc-tutorial
          • Language
          • Representation
          • Transformation-tools
          • Insertion-sort
          • Pack
        • Bv
        • Imp-language
        • Event-macros
        • Java
        • Bitcoin
        • Ethereum
        • Yul
        • Zcash
        • ACL2-programming-language
        • Prime-fields
        • Json
        • Syntheto
        • File-io-light
        • Cryptography
        • Number-theory
        • Lists-light
        • Axe
        • Builtins
        • Solidity
        • Helpers
        • Htclient
        • Typed-lists-light
        • Arithmetic-light
      • X86isa
      • Axe
      • Execloader
    • Math
    • Testing-utilities
  • Atc-implementation

Atc-abstract-syntax

Abstract syntax of C for ATC.

ATC generates C code by generating abstract syntax trees and pretty-printing them. For now we use the abstract syntax from the C language formalization. In fact, that abstract syntax has been initially motivated by ATC.

However, in the future we may have a separate syntax for ATC, for the following reasons:

  1. The abstract syntax from the C language formalization captures syntax after preprocessing, but at some point we may want ATC to generate code with at least some of the preprocessing constructs, such as #includes, and possibly also some (simple) macros. This means that the ATC abstract syntax will have to mix preprocessing constructs with the preprocessed constructs: this is something that may not be part, in this mixed form, of the language formalization, which should differentiate between preprocessing translation units and (preprocessed) translation units.
  2. We might want ATC to generate certain comments in the code, which would require the ATC abstract syntax to incorporate some information about comments. However, in the language formalization, comments, and their removal during one of C's translation phases, would be captured differently, not as part of the abstract syntax over which the language semantics is defined.
  3. While the abstract syntax from the language formalization may be generalized to cover much more of the C language, the abstract syntax for ATC can incorporate restrictions that make it simpler and that fit the C code generated by ATC. In particular, the C syntax for declarations and related entities is quite complex, with significant mutual recursion, and with many constraints not directly captured by the C grammar. For instance, typedef is classified as a storage class specifier, for syntactic convenience apparently [C17:6.7.1/5], even though its role is very different from the others. Thus, by differentiating more explicitly, in our ATC abstract syntax, among different kinds of declarations and related entities, we make things more clear overall, as far as ATC is concerned.

Notwithstanding the above three reasons, in the short term, for expediency, we might actually incorporate preprocessing directives and comments, and impose restrictions, on the abstract syntax from the language formalization, rather than creating a separate abstract syntax for ATC. So long as the language formalization appropriately covers a subset of C, there is nothing inherently wrong with that approach. However, longer-term, as the language formalization is made more general, in particular covering the translation phases [C17:5.1.12] explicitly, we will likely need to differentiate the abstract syntax for ATC from the one(s) from the language formalization. For proof generation, we would provide a mapping from the former to the latter, or in fact we may have the proof apply to the actual concrete syntax, if the language formalization includes parsing.

Some observations about some parts of the abstract syntax in relation to ATC:

  • The fixtype ident allows any ACL2 string, which may thus not be valid C identifiers. However, ATC always generates valid C identifiers.
  • The fixtype ident models only ASCII identifiers. For ATC, ASCII identifiers may be all that we need in the foreseeable future: we may not need to generate C programs with identifiers that include non-ASCII characters, some aspects of which may be implementation-dependent or locale-dependent. Since ASCII identifiers are portable, we plan for ATC to generate only ASCII identifiers, unless there will be reasons to support a broader range.
  • The fixtype ident allows identifiers of any length. In the interest of portability, it is our intention to have ATC generate identifiers of 31 characters or less (which is the minimum of significant characters in (external) identifiers [C17:6.4.2.1/5] [C17:6.4.2.1/6] [C17:5.2.4.1] which may not be a significant limitation.
  • The fixtype iconst does not capture leading 0s in octal and hexadecimal constants. This is not a severe limitation, but at some point we may want ATC to generate something like 0x0000ffff.
  • The fixtype const includes enuemration constants, which, as discussed there, cannot be differentiated from identifier expressions during parsing, but only after static semantic checking. This is not an issue for ATC, which generates, does not parse, C code: ATC can generate one or the other case in the code.
  • The fixtypes for declarations do not support function pointers. Most likely, this support will not be neded for ATC, given that ACL2 is first-order, and thus cannot readily represent C function pointers. (However, perhaps there is a way to represent function pointers with apply$.)
  • The fixtype block-item only supports object declarations. This suffices for ATC currently.
  • The fixtype fundef could be alteratively defined as consisting of a function declaration and a body. However, even though this would work in abstract syntax, in concrete syntax a function declaration has to end with a semicolon (and that is why the grammar rule in [C17:6.9.1/1] does not use a declaration, but rather its components): thus, for the ATC pretty-printer, we want to differentiate between the type specifier sequences and declarators that form a full function declaration, and the type specifier sequences and declarators that are part of a function definition.

Subtopics

Irr-tyspecseq
An irrelevant type specifier sequence.
Irr-tag-declon
An irrelevant structure/union/enumeration declaration.
Irr-param-declon
An irrelevant parameter declaration.
Irr-iconst-length
An irrelevant length suffix.
Irr-iconst
An irrelevant integer constant.
Irr-fundef
An irrelevant function definition.
Irr-ext-declon
An irrelevant external declaration.
Irr-unop
An irrelevant unary operator.
Irr-tyname
An irrelevant type name.
Irr-transunit
An irrelevant translation unit.
Irr-stmt
An irrelevant statement.
Irr-initer
An irrelevant initializer.
Irr-ident
An irrelevant identifier.
Irr-fileset
An irrelevant file set.
Irr-expr
An irrelevant expression.
Irr-block-item
An irrelevant block-item.
Irr-binop
An irrelevant binary operator.
Irr-file
An irrelevant file.