Implementation of the parser of ABNF grammars.

This is a recursive-descent, backtracking parser. There is a parsing function for every rule, and parsing functions for certain groups, options, and repetitions. There are also parameterized parsing functions for terminals (natural numbers) matching exact values, ranges, and (case-insensitively) characters.

Some of the parsing functions that make up the parser are *Seq* parsing primitives. The remarks below apply to those functions as well.

Each of these parsing functions
takes as input a list of natural numbers (i.e. terminals),
and returns as outputs
(i) an indication of success or failure,
(ii) the tree or list of trees
obtained from parsing a prefix of the input
(or *Seq* macros,
with which these parsing functions are implemented.

The `parse-grammar` top-level function
takes as input a list of natural numbers
and returns as output just a tree, or `parse-grammar-from-file` function
takes as input a file name and calls `parse-grammar` on its content,
returning a tree or

Each parsing function definition is accompanied by a theorem stating that the function fixes its input list of natural numbers. The proof of each such theorem uses, as rewrite rules, the theorems for the parsing functions called by the parsing function whose theorem is being proved.

As explained in the documentation of `parse-grammar*`,
the grammar of the ABNF concrete syntax [RFC:4] is ambiguous.
The rule

Aside from the

- In the
repeat rule, a look-ahead of an unbounded number ofDIGIT s is needed to determine the alternative (the second alternative if"*" is found after theDIGIT s, otherwise the first alternative). - In the
concatenation rule, a look-ahead of an unbounded number ofc-wsp s is needed to determine where aconcatenation ends (it does if norepetition is found after thec-wsp s, otherwise theconcatenation continues with the foundrepetition ).

Aside from the

- In the
defined-as rule, a look-ahead of two symbols is needed to distinguish"=" and"=/" . - In the
element rule, a look-ahead of two symbols is needed to distinguishnum-val andchar-val (the two symbols are"%" and the one after). - In the
char-val rule, a look-ahead of two symbols is needed to distinguishcase-insensitive-string andcase-sensitive-string (the two symbols are"%" and the one after).

In each of the three rules listed above,
the two choices have the first character in common.
Thus, it may seem that these rules are actually LL(1),
by first parsing the first character in common
and then deciding how to proceed based on the next character.
However, each character pair like `parse-ichar2`
which produces a single leaf tree
with the list of those two character codes,
not via two calls to `parse-ichar`
which would produce two leaf trees
each with a singleton list of one character code.
If the rules were formulated as concatenations of single-character strings
(e.g.

Aside from the

The parser resolves the `parse-*cwsp` always try the extra `parse-*cwsp` tries to parse as many

The look-ahead for the LL(*), LL(2), and LL(1) rules is handled via backtracking. The amount of backtracking is expected to be small in reasonable grammars.

The termination of the singly recursive parsing functions
(e.g. `parse-*bit`)
is proved by showing that the size of the input decreases.

The termination of the mutually recursive parsing functions
(i.e. `parse-alternation`, `parse-concatenation`, etc.)
is proved via a lexicographic measure consisting of
the size of the input and an ordering of the parsing functions.
This is explained in the following paragraphs.

Since `parse-alternation` calls `parse-concatenation`
on the same input,
the size of the input alone is not sufficient
to show that the mutually recursive parsing functions terminate.
But `parse-concatenation` never
(indirectly) calls `parse-alternation` on the same input:
it has to go through `parse-group` or `parse-option`,
which consume a `parse-alternation` on, therefore, a smaller input.
So if we order the parsing functions, by assigning numbers to them,
so that `parse-alternation` has
a larger order number than `parse-concatenation`,
either the size of the input goes down,
or it stays the same but the parsing function order number goes down.
In other words, the lexicographic measure goes down.

To establish the relative ordering of the parsing functions, we look at which ones (may) call which other ones on the same input: the former must be (assigned) larger (order numbers) than the latter. Thus, we have the following ordering constraints:

`parse-alternation`must be larger than`parse-concatenation`.`parse-concatenation`must be larger than`parse-repetition`.`parse-repetition`must be larger than`parse-element`. (The former calls the latter on the same input if`parse-?repeat`does not consume any input.)`parse-element`must be larger than`parse-group`.`parse-element`must be larger than`parse-option`.`parse-alt-rest`must be larger than`parse-alt-rest-comp`.`parse-conc-rest`must be larger than`parse-conc-rest-comp`.

These constraints provide a partial order on the parsing function, which we can totalize as follows (from smallest to largest):

`parse-conc-rest-comp``parse-conc-rest``parse-alt-rest-comp``parse-alt-rest``parse-option``parse-group``parse-element``parse-repetition``parse-concatenation``parse-alternation`

Note that when a smaller function calls a larger or equal function,
it does so on a smaller input.
In particular:
`parse-group` and `parse-option` call `parse-alternation`
only after consuming a `parse-alt-rest-comp` calls `parse-concatenation`
only after consuming at least a `parse-conc-rest-comp` calls `parse-repetition`
only after consuming at least one

The theorems about input lengths that accompany the parsing function definitions are used in the termination proofs, both of the singly recursive functions and of the mutually recursive functions.

The termination of the mutually recursive parsing functions
involves an additional complication.
After `parse-alternation` calls `parse-concatenation`,
it may call `parse-alt-rest`,
which involves a proof obligation that
the input to `parse-alt-rest`,
which is the remaining input after `parse-concatenation`,
has length less than or equal to the input to `parse-alternation`.
This is the case, because `parse-concatenation` always returns
a remaining input of length less than or equal to its input.
But this property can be only proved
after `parse-concatenation` is admitted,
that is after its (mutual) termination has been proved:
for the termination proof, it is like an uninterpreted function.
The same holds for the interaction between
other mutually recursive parsing functions.

The known solution to the issue just discussed
is to add tests saying that
the remaining input of `parse-concatenation`
is smaller than the initial input to `parse-alternation`
(which is also the input to `parse-concatenation`).
This way the termination proof can proceeed.
However, since those tests are always true,
they can be wrapped in `mbt`,
which engenders the obligation to prove that are always true,
as part of guard verification
(and additionally, their execution is avoided).
The *Seq* macros provide operators `mbt` tests of inequalities
on the lengths of inputs and remaining inputs.
So we use those in our mutually recursive parsing functions.

- Parse-grammar-from-file
- Parse a file into an ABNF grammar.
- Parse-ichar2
- Parse two natural numbers that encode, case-insensitively, two given characters, into a tree that matches a case-insensitive character value notation that consists of those two characters.
- Parse-ichar
- Parse a natural number that encodes, case-insensitively, a given character, into a tree that matches a case-insensitive character value notation that consists of that character.
- Parse-in-either-range
- Parse a natural number in one of two given ranges into a tree that matches a group consisting of an alternation of the corresponding range numeric value notations.
- Parse-*-in-either-range
- Parse zero or more natural numbers, each of which in one of two given ranges, into a list of trees that matches the repetition of zero or more alternations of the corresponding range numeric value notations.
- Parse-bit
- Parse a bit.
- Parse-equal-/-equal-slash
- Parse the group
("=" / "=/") . - Parse-wsp
- Parse a space or horizontal tab.
- Parse-*digit-star-*digit
- Parse a group
(*DIGIT "*" *DIGIT) . - Parse-hexdig
- Parse a hexadecimal digit.
- Parse-cwsp
- Parse either white space, or a comment and new line followed by white space.
- Parse-concatenation
- Parse a concatenation.
- Parse-case-insensitive-string
- Parse a case-insensitive character value notation.
- Parse-alpha
- Parse a letter.
- Parse-dot-1*bit
- Parse a group
("." 1*BIT) . - Parse-case-sensitive-string
- Parse a case-sensitive character value notation.
- Parse-bin/dec/hex-val
- Parse a group
(bin-val / dec-val / hex-val) . - Parse-alpha/digit/dash
- Parse a group
(ALPHA / DIGIT / "-") . - Parse-rule
- Parse a rule.
- Parse-repeat
- Parse a repetition range.
- Parse-cnl
- Parse a comment or new line.
- Parse-sp
- Parse a space.
- Parse-rule-/-*cwsp-cnl
- Parse a group
( rule / (*c-wsp c-nl) ) . - Parse-element
- Parse an element.
- Parse-dec-val
- Parse a decimal numeric value notation.
- Parse-bin-val
- Parse a binary numeric value notation.
- Parse-alternation
- Parse an alternation.
- Parse-*cwsp-cnl
- Parse a group
(*c-wsp c-nl) . - Parse-*bit
- Parse a repetition of zero or more bits.
- Parse-wsp/vchar
- Parse a group
(WSP / VCHAR) . - Parse-quoted-string
- Parse a quoted string.
- Parse-prose-val
- Parse a prose value notation.
- Parse-hex-val
- Parse a hexadecimal numeric value notation.
- Parse-grammar
- Parse a sequence of natural numbers into an ABNF grammar.
- Parse-elements
- Parse the definiens of a rule.
- Parse-defined-as
- Parse the text between a rule name and its definiens.
- Parse-dash-1*hexdig
- Parse a group
("-" 1*HEXDIG) . - Parse-comment
- Parse a comment.
- Parse-char-val
- Parse a character value notation.
- Parse-1*bit
- Parse a repetition of one or more bits.
- Parse-rulename
- Parse a rule name.
- Parse-num-val
- Parse a numeric value notation.
- Parse-htab
- Parse a horizontal tab.
- Parse-hex-val-rest
- Parse an option
[ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ] , which is the rest of the definiens ofhex-val . - Parse-dot-1*hexdig
- Parse a group
("." 1*HEXDIG) . - Parse-dot-1*digit
- Parse a group
("." 1*DIGIT) . - Parse-digit
- Parse a decimal digit.
- Parse-dec-val-rest
- Parse an option
[ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ] , which is the rest of the definiens ofdec-val . - Parse-dash-1*digit
- Parse a group
("-" 1*DIGIT) . - Parse-dash-1*bit
- Parse a group
("-" 1*BIT) . - Parse-crlf
- Parse a carriage return followed by a line feed.
- Parse-cr
- Parse a carriage return.
- Parse-conc-rest-comp
- Parse a group
(1*c-wsp repetition) , which is a component of the rest of the definiens ofconcatenation . - Parse-cnl-wsp
- Parse a group
(c-nl WSP) . - Parse-bin-val-rest
- Parse an option
[ 1*("." 1*BIT) / ("-" 1*BIT) ] , which is the rest of the definiens ofbin-val . - Parse-alt-rest-comp
- Parse a group
(*c-wsp "/" *c-wsp concatenation) , which is a component of the rest of the definiens ofalternation . - Parse-*cwsp
- Parse a repetition of zero or more
c-wsp s. - Parse-vchar
- Parse a visible character.
- Parse-rulelist
- Parse a rule list.
- Parse-option
- Parse an option.
- Parse-group
- Parse a group.
- Parse-1*-dot-1*hexdig
- Parse a repetition
1*("." 1*HEXDIG) . - Parse-1*-dot-1*digit
- Parse a repetition
1*("." 1*DIGIT) . - Parse-1*-dot-1*bit
- Parse a repetition
1*("." 1*BIT) . - Parse-*-rule-/-*cwsp-cnl
- Parse a repetition
*( rule / (*c-wsp c-nl) ) . - Parse-*-alpha/digit/dash
- Parse a repetition
*(ALPHA / DIGIT / "-") . - Parse-lf
- Parse a line feed.
- Parse-dquote
- Parse a double quote.
- Parse-1*hexdig
- Parse a repetition of one or more hexadecimal digits.
- Parse-1*cwsp
- Parse a repetition of one or more
c-wsp s. - Parse-?repeat
- Parse an optional repetition range.
- Parse-*-dot-1*hexdig
- Parse a repetition
*("." 1*HEXDIG) . - Parse-*-dot-1*digit
- Parse a repetition
*("." 1*DIGIT) . - Parse-*-dot-1*bit
- Parse a repetition
*("." 1*BIT) . - Parse-repetition
- Parse a repetition.
- Parse-1*digit
- Parse a repetition of one or more decimal digits.
- Parse-?%i
- Parse an option
[ "%i" ] . - Parse-*wsp/vchar
- Parse a repetition
*(WSP / VCHAR) . - Parse-*hexdig
- Parse a repetition of zero or more hexadecimal digits.
- Parse-*digit
- Parse a repetition of zero or more decimal digits.
- Parse-alt-rest
- Parse a repetition
*(*c-wsp "/" *c-wsp concatenation) , which is the rest of the definiens ofalternation . - Parse-conc-rest
- Parse a repetition
*(1*c-wsp repetition) , which is the rest of the definiens ofconcatenation . - *grammar-parser-error-msg*
- Message for grammar parsing errors.