Search-engine friendly clone of the ACL2 documentation.

Rlp-decoding-executable

Rlp-parse-tree

Parse the RLP encoding of a tree, returning any extra bytes for further parsing.

Signature
(rlp-parse-tree encoding) → (mv error? tree rest)
Arguments: encoding — Guard (byte-listp encoding).
Returns: error? — Type (maybe-rlp-error-p error?).; tree — Type (rlp-treep tree).; rest — Type (byte-listp rest).

This function returns an error result (nil if no error), the decoded tree, and the remaining unparsed bytes. If the error is not nil, we return an irrelevant tree as second result and nil as remaining bytes.

This parser is fairly straightforward, but there is a slight subtlety missed by some implementations, including the reference decoding code in [Wiki:RLP]: in order to recognize only valid encodings (as formalized by rlp-tree-encoding-p), a decoder must reject encodings that use ``non-optimal'' lengths. For instance, the singleton byte array (10) must be encoded as itself (rlp-encode-bytes), and not as (129 10). Note that rlp-bytes-encoding-p does not hold on (129 10), because that sequence of bytes is not in the image of rlp-encode-bytes. (This example talks about encoded byte arrays, but leaf trees are encoded in the same way.)

So our decoding code checks that (i) the form (129 x) is not used with x below 128, (ii) a big endian length has no leading zeros, and (iii) a big endian length is not below 56. Cases (ii) and (iii) apply to both leaf and branching trees, while case (i) applies to leaf trees only. Without these extra checks, the decoder would accept not only all the valid encodings, but also some invalid encodings with non-optimal lengths.

Neither [YP:B] nor [Wiki:RLP] explicitly talk about this, but the fact that [YP:B] prescribes the encodings in all cases can be reasonably taken to exclude the encodings with non-optimal lengths. Furthermore, various GitHub issues for Ethereum implementations regard the acceptance of encodings with non-optimal lengths as bugs. Thus, we take the strict interpretation and regard such encodings as invalid. (Otherwise, formally, encoding would have to be a relation instead of a function.)

The function to parse single trees is mutually recursive with a function to parse lists of trees. The latter does not return the remaining bytes: it is always called with the exact byte sequence that is supposed to encode the trees of the list, one tree after the other.

The termination of these parsing function is proved via a lexicographic measure. The first component is the length of the input; the second component is an ordering on the two parsing functions where the one for trees is smaller than the one for lists of trees. This second component is needed because the function for lists of trees calls the one for trees on the same input (to parse the first tree in the list), and so the ordering on the function makes the overall measure smaller. When the function for trees calls the one for lists of trees, the input has been reduced in length by at least the first byte, and so it is immediate to prove that the overall measure decreases. When the function for lists of trees calls itself recursively, the input has also decreased, but this is a property of the function for trees, which cannot be proved until the function is admitted, i.e. shown to terminate. In other words, the termination here depends on functional properties. Thus, we use mbt to ensure that the length has in fact decreased, which lets us prove termination. Then guard verification ensures that that check is always satisfied. Under the negated mbt condition, we must return some error, but it does not matter which error, since this case never happens.

Before verifying guards, we need to show that rlp-parse-tree returns fewer remaining bytes than the input bytes when there is no error. This is done by the linear rules.

If a sequence of bytes is successfully parsed, a sequence obtained by adding extra bytes at the end is also successfully parsed. The decoded tree is the same. The remaining bytes are extended with the extra bytes. This is all expressed by the theorem rlp-parse-tree-extend. This theorem plays a role in the left inverse proof below.

rlp-parse-tree is left inverse of rlp-encode-tree, over the encodable trees. This implies that rlp-parse-tree accepts all valid encodings of trees. This is expressed by the theorem rlp-parse-tree-of-rlp-encode-tree, which says that if rlp-encode-tree is successful, then parsing is successful, with no remaining bytes, and returns the starting bytes (modulo fixing). The proof is by induction on the encoding functions. It is not particularly difficult. It needs some :expand hints for calls that ACL2's heuristics do not expand; note that there is an :expand hint for each possible range of the first byte of the encoding, except for the range below 128.

rlp-parse-tree is right inverse of rlp-encode-tree, over the valid tree encodings. This implies that rlp-parse-tree accepts only valid encodings of tree: if it accepted an invalid encoding, this right inverse theorem would imply that rlp-encode-tree would map the result back to that invalid encoding, which would therefore be a valid encoding, contradicting the initial assumption. This right inverse theorem is rlp-encode-tree-of-rlp-parse-tree, which says that if decoding is successful, then encoding is successful and the resulting encoding, concatenated with the reamining bytes from the decoding, is the initial input of the encoding (modulo fixing). The proof is by induction on the parsing functions. It is not particularly difficult.

Without the extra checks for optimal lengths in the parser, the left inverse theorem would still be provable, but the right inverse theorem would not.

Function: rlp-parse-tree

(defun rlp-parse-tree (encoding)
 (declare (xargs :guard (byte-listp encoding)))
 (b*
  ((encoding (byte-list-fix encoding))
   (irrelevant (rlp-tree-leaf nil))
   ((when (endp encoding))
    (mv (rlp-error-no-bytes)
        irrelevant nil))
   ((cons first encoding) encoding)
   ((when (< first 128))
    (mv nil (rlp-tree-leaf (list first))
        encoding))
   ((when (<= first 183))
    (b*
     ((len (- first 128))
      ((when (< (len encoding) len))
       (mv
        (rlp-error-fewer-bytes-than-short-length (list first)
                                                 len (len encoding))
        irrelevant nil))
      (bytes (take len encoding))
      ((when (and (= len 1) (< (car bytes) 128)))
       (mv (rlp-error-non-optimal-short-length
                (list first (car bytes)))
           irrelevant nil))
      (encoding (nthcdr len encoding)))
     (mv nil (rlp-tree-leaf bytes)
         encoding)))
   ((when (< first 192))
    (b*
     ((lenlen (- first 183))
      ((when (< (len encoding) lenlen))
       (mv (rlp-error-fewer-bytes-than-length-of-length
                (list first)
                lenlen (len encoding))
           irrelevant nil))
      (len-bytes (take lenlen encoding))
      ((unless (equal (trim-bendian* len-bytes)
                      len-bytes))
       (mv (rlp-error-leading-zeros-in-long-length
                (cons first len-bytes))
           irrelevant nil))
      (encoding (nthcdr lenlen encoding))
      (len (bebytes=>nat len-bytes))
      ((when (<= len 55))
       (mv
          (rlp-error-non-optimal-long-length (cons first len-bytes))
          irrelevant nil))
      ((when (< (len encoding) len))
       (mv (rlp-error-fewer-bytes-than-long-length
                (cons first len-bytes)
                len (len encoding))
           irrelevant nil))
      (bytes (take len encoding))
      (encoding (nthcdr len encoding)))
     (mv nil (rlp-tree-leaf bytes)
         encoding)))
   ((when (<= first 247))
    (b*
     ((len (- first 192))
      ((when (< (len encoding) len))
       (mv
        (rlp-error-fewer-bytes-than-short-length (list first)
                                                 len (len encoding))
        irrelevant nil))
      (subencoding (take len encoding))
      (encoding (nthcdr len encoding))
      ((mv error? subtrees)
       (rlp-parse-tree-list subencoding))
      ((when error?)
       (mv (rlp-error-subtree error?)
           irrelevant nil)))
     (mv nil (rlp-tree-branch subtrees)
         encoding)))
   (lenlen (- first 247))
   ((when (< (len encoding) lenlen))
    (mv (rlp-error-fewer-bytes-than-length-of-length
             (list first)
             lenlen (len encoding))
        irrelevant nil))
   (len-bytes (take lenlen encoding))
   ((unless (equal (trim-bendian* len-bytes)
                   len-bytes))
    (mv
     (rlp-error-leading-zeros-in-long-length (cons first len-bytes))
     irrelevant nil))
   (encoding (nthcdr lenlen encoding))
   (len (bebytes=>nat len-bytes))
   ((when (<= len 55))
    (mv (rlp-error-non-optimal-long-length (cons first len-bytes))
        irrelevant nil))
   ((when (< (len encoding) len))
    (mv
      (rlp-error-fewer-bytes-than-long-length (cons first len-bytes)
                                              len (len encoding))
      irrelevant nil))
   (subencoding (take len encoding))
   (encoding (nthcdr len encoding))
   ((mv error? subtrees)
    (rlp-parse-tree-list subencoding))
   ((when error?)
    (mv (rlp-error-subtree error?)
        irrelevant nil)))
  (mv nil (rlp-tree-branch subtrees)
      encoding)))

Function: rlp-parse-tree-list

(defun rlp-parse-tree-list (encoding)
  (declare (xargs :guard (byte-listp encoding)))
  (b* (((when (endp encoding)) (mv nil nil))
       ((mv error? tree encoding1)
        (rlp-parse-tree encoding))
       ((when error?) (mv error? nil))
       ((unless (mbt (< (len encoding1) (len encoding))))
        (mv (rlp-error-no-bytes) nil))
       ((mv error? trees)
        (rlp-parse-tree-list encoding1))
       ((when error?) (mv error? nil)))
    (mv nil (cons tree trees))))

Theorem: return-type-of-rlp-parse-tree.error?

(defthm return-type-of-rlp-parse-tree.error?
  (b* (((mv ?error? ?tree common-lisp::?rest)
        (rlp-parse-tree encoding)))
    (maybe-rlp-error-p error?))
  :rule-classes :rewrite)

Theorem: return-type-of-rlp-parse-tree.tree

(defthm return-type-of-rlp-parse-tree.tree
  (b* (((mv ?error? ?tree common-lisp::?rest)
        (rlp-parse-tree encoding)))
    (rlp-treep tree))
  :rule-classes :rewrite)

Theorem: return-type-of-rlp-parse-tree.rest

(defthm return-type-of-rlp-parse-tree.rest
  (b* (((mv ?error? ?tree common-lisp::?rest)
        (rlp-parse-tree encoding)))
    (byte-listp rest))
  :rule-classes :rewrite)

Theorem: return-type-of-rlp-parse-tree-list.error?

(defthm return-type-of-rlp-parse-tree-list.error?
  (b* (((mv ?error? ?trees)
        (rlp-parse-tree-list encoding)))
    (maybe-rlp-error-p error?))
  :rule-classes :rewrite)

Theorem: return-type-of-rlp-parse-tree-list.trees

(defthm return-type-of-rlp-parse-tree-list.trees
  (b* (((mv ?error? ?trees)
        (rlp-parse-tree-list encoding)))
    (rlp-tree-listp trees))
  :rule-classes :rewrite)

Theorem: len-of-rlp-parse-tree

(defthm len-of-rlp-parse-tree
  (<= (len (mv-nth 2 (rlp-parse-tree encoding)))
      (len encoding))
  :rule-classes :linear)

Theorem: len-of-rlp-parse-tree-when-no-error

(defthm len-of-rlp-parse-tree-when-no-error
  (implies (not (mv-nth 0 (rlp-parse-tree encoding)))
           (< (len (mv-nth 2 (rlp-parse-tree encoding)))
              (len encoding))))

Theorem: rlp-parse-tree-of-byte-list-fix-encoding

(defthm rlp-parse-tree-of-byte-list-fix-encoding
  (equal (rlp-parse-tree (byte-list-fix encoding))
         (rlp-parse-tree encoding)))

Theorem: rlp-parse-tree-byte-list-equiv-congruence-on-encoding

(defthm rlp-parse-tree-byte-list-equiv-congruence-on-encoding
  (implies (byte-list-equiv encoding encoding-equiv)
           (equal (rlp-parse-tree encoding)
                  (rlp-parse-tree encoding-equiv)))
  :rule-classes :congruence)

Theorem: rlp-parse-tree-extend

(defthm rlp-parse-tree-extend
  (implies
       (not (mv-nth 0 (rlp-parse-tree encoding)))
       (and (not (mv-nth 0
                         (rlp-parse-tree (append encoding more))))
            (equal (mv-nth 1
                           (rlp-parse-tree (append encoding more)))
                   (mv-nth 1 (rlp-parse-tree encoding)))
            (equal (mv-nth 2
                           (rlp-parse-tree (append encoding more)))
                   (append (mv-nth 2 (rlp-parse-tree encoding))
                           (byte-list-fix more))))))

Theorem: rlp-parse-tree-of-rlp-encode-tree

(defthm rlp-parse-tree-of-rlp-encode-tree
  (b* (((mv e-error? encoding)
        (rlp-encode-tree tree))
       ((mv d-error? tree1 rest)
        (rlp-parse-tree encoding)))
    (implies (not e-error?)
             (and (not d-error?)
                  (not (consp rest))
                  (equal tree1 (rlp-tree-fix tree))))))

Theorem: rlp-parse-tree-list-of-rlp-encode-tree-list

(defthm rlp-parse-tree-list-of-rlp-encode-tree-list
  (b* (((mv e-error? encoding)
        (rlp-encode-tree-list trees))
       ((mv d-error? trees1)
        (rlp-parse-tree-list encoding)))
    (implies (not e-error?)
             (and (not d-error?)
                  (equal trees1 (rlp-tree-list-fix trees))))))

Theorem: rlp-encode-tree-of-rlp-parse-tree

(defthm rlp-encode-tree-of-rlp-parse-tree
  (b* (((mv d-error? tree rest)
        (rlp-parse-tree encoding))
       ((mv e-error? encoding1)
        (rlp-encode-tree tree)))
    (implies (not d-error?)
             (and (not e-error?)
                  (equal (append encoding1 rest)
                         (byte-list-fix encoding))))))

Theorem: rlp-encode-tree-list-of-rlp-parse-tree-list

(defthm rlp-encode-tree-list-of-rlp-parse-tree-list
  (b* (((mv d-error? trees)
        (rlp-parse-tree-list encoding))
       ((mv e-error? encoding1)
        (rlp-encode-tree-list trees)))
    (implies (not d-error?)
             (and (not e-error?)
                  (equal encoding1 (byte-list-fix encoding))))))

Subtopics

Rlp-parse-tree-list