• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
      • Gl
      • Esim
      • Vl2014
      • Sv
      • Fgl
      • Vwsim
      • Vl
        • Syntax
        • Loader
          • Preprocessor
          • Vl-loadconfig
          • Vl-loadstate
          • Lexer
            • Lex-strings
            • Lex-identifiers
              • Vl-printable-not-whitespace-p
              • Vl-simple-id-tail-p
              • Vl-simple-id-head-p
              • Vl-lex-system-identifier
              • Vl-read-escaped-identifier
              • Vl-lex-simple-identifier-or-keyword
              • Vl-lex-escaped-identifier
              • Vl-printable-not-whitespace-list-p
              • Vl-simple-id-tail-list-p
              • Vl-simple-id-head-list-p
              • Vl-read-simple-identifier
            • Vl-typo-uppercase-p
            • Vl-typo-number-p
            • Vl-typo-lowercase-p
            • Lex-numbers
            • Chartypes
            • Vl-lex
            • Defchar
            • Tokens
            • Lex-keywords
            • Lexstate
            • Make-test-tokens
            • Lexer-utils
            • Lex-comments
            • Vl-typo-uppercase-list-p
            • Vl-typo-lowercase-list-p
            • Vl-typo-number-list-p
          • Parser
          • Vl-load-merge-descriptions
          • Vl-find-basename/extension
          • Vl-load-file
          • Vl-loadresult
          • Scope-of-defines
          • Vl-find-file
          • Vl-flush-out-descriptions
          • Vl-description
          • Vl-read-file
          • Vl-includeskips-report-gather
          • Vl-load-main
          • Extended-characters
          • Vl-load
          • Vl-load-description
          • Vl-descriptions-left-to-load
          • Inject-warnings
          • Vl-preprocess-debug
          • Vl-write-preprocessor-debug-file
          • Vl-read-file-report-gather
          • Vl-load-descriptions
          • Vl-load-files
          • Translate-off
          • Vl-load-read-file-hook
          • Vl-read-file-report
          • Vl-loadstate-pad
          • Vl-load-summary
          • Vl-collect-modules-from-descriptions
          • Vl-loadstate->warnings
          • Vl-iskips-report
          • Vl-descriptionlist
        • Warnings
        • Getting-started
        • Utilities
        • Printer
        • Kit
        • Mlib
        • Transforms
      • X86isa
      • Svl
      • Rtl
    • Software-verification
    • Math
    • Testing-utilities
  • Lexer

Lex-identifiers

Handling of identifiers.

The grammars of Verilog-2005 and SystemVerilog-2012 agree on:

identifier ::= simple_identifier
             | escaped_identifier

simple_identifier ::= [ a-zA-Z_ ] { [a-zA-Z0-9_$ ] }
  (no embedded spaces)

The Verilog-2005 grammar presents the rule for escaped identifiers as:

escaped_identifier ::= \ { any non-whitespace character } white_space

However, Section 3.7.1 of the Verilog-2005 standard appears to contradict the above definition. It says that escaped identifiers "provide a means of including any of the printable ASCII characters in an identifier (the decimal values 33 through 126...). Section 5.6.1 of the SystemVerilog-2012 standard says the same thing, and its grammar has been updated with this clarification:

escaped_identifier ::= \ { any printable non-whitespace character } white_space

We therefore restrict the name characters in escaped identifiers to the printable ASCII characters, i.e., characters whose codes are 33-126.

Both Verilog-2005 and SystemVerilog agree on the syntax for system identifiers:

system_tf_identifier ::= $[ a-zA-Z0-9_$ ] { [ a-zA-Z0-9_$ ] }

Well, that's arguably true. SystemVerilog adds certain pieces of syntax such as $ and $root that overlap with system_tf_identifier. We generally turn these into special tokens; see vl-lex-system-identifier.

Whitespace Minutia

Per Section 3.7.1 of Verilog-2005, the leading backslash character and the terminating whitespace character are not "considered to be part of the identifier", i.e., \cpu3 is treated the same as cpu3. Section 5.6.1 of the SystemVerilog-2012 standard says the same thing. Note that the Verilog grammar treats EOF as a whitespace, so we allow an escaped identifier to be closed with EOF -- there just isn't a whitespace character at the end of the PREFIX in that case.

Perhaps a reason for including this whitespace is found on page 351 of the Verilog-2005 standard. A macro with arguments is introduced like `define max(a,b) ... with no whitespace between its name (an identifier) and the first paren of the argument list. So if you wanted to have an escaped identifier as the name of a macro with arguments, how would you know when the identifier ended and the argument list began? Making the escaped identifier include a whitespace seems like a dirty trick to accomplish this. In any event, we don't support macros with arguments anyway, but we go ahead and include the whitespace in case such support is ever added.

Empty Identifiers

I have not found anything in the spec which explicitly prohibits the empty escaped identifier, i.e., \<whitespace>, from being used. Nevertheless, I exclude it on the grounds that it is suspicious and Cadence does not permit it either. Allowing it would make end-of-define even more complicated to properly support in the preprocessor.

Notes about Honsing Identifiers

We are always careful to hons the names of the identifier tokens we create. One reason it's a good idea is that identifiers are often repeated many times, so making the actual string part a hons lets us use only one copy of the string. The other big reason is that identifier names are often used in fast-alist lookups, and if the string isn't a hons, then hons-get will have to hons it first anyway. So, by honsing as we create the identifier token, we potentially avoid a lot of repeated, redundant honsing later on.

Subtopics

Vl-printable-not-whitespace-p
Match any printable non-whitespace character.
Vl-simple-id-tail-p
[a-zA-Z0-9_$]
Vl-simple-id-head-p
[a-zA-Z_]
Vl-lex-system-identifier
Try to match a system identifier (or some other special token like $ or $root.
Vl-read-escaped-identifier
Vl-lex-simple-identifier-or-keyword
Match either a keyword or an ordinary, simple identifier.
Vl-lex-escaped-identifier
Vl-printable-not-whitespace-list-p
(vl-printable-not-whitespace-list-p x) recognizes lists where every element satisfies vl-printable-not-whitespace-p.
Vl-simple-id-tail-list-p
(vl-simple-id-tail-list-p x) recognizes lists where every element satisfies vl-simple-id-tail-p.
Vl-simple-id-head-list-p
(vl-simple-id-head-list-p x) recognizes lists where every element satisfies vl-simple-id-head-p.
Vl-read-simple-identifier
Try to match a simple identifier (or any keyword!).