• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
      • Legacy-defrstobj
      • Proof-checker-array
      • Soft
      • C
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Leftist-trees
      • Java
      • Taspi
      • Bitcoin
      • Riscv
      • Des
      • Ethereum
      • X86isa
      • Sha-2
      • Yul
        • Transformations
        • Language
          • Abstract-syntax
          • Dynamic-semantics
          • Concrete-syntax
          • Static-soundness
          • Static-semantics
            • Static-safety-checking
            • Static-shadowing-checking
            • Mode-set-result
            • Literal-evaluation
              • Eval-literal
              • Eval-hex-string-literal
              • Eval-plain-string-literal
              • Ubyte16-to-utf8
                • Eval-escape
                • Eval-string-element
                • Eval-hex-string-rest-element-list
                • Eval-hex-string-rest-element
                • Eval-hex-string-content
                • Eval-string-element-list
                • Eval-hex-quad
                • Eval-hex-pair
              • Static-identifier-checking
              • Static-safety-checking-evm
              • Mode-set
              • Modes
            • Errors
          • Yul-json
        • Zcash
        • Proof-checker-itp13
        • Regex
        • ACL2-programming-language
        • Json
        • Jfkr
        • Equational
        • Cryptography
        • Poseidon
        • Where-do-i-place-my-book
        • Axe
        • Bigmems
        • Builtins
        • Execloader
        • Aleo
        • Solidity
        • Paco
        • Concurrent-programs
        • Bls12-377-curves
      • Debugging
      • Std
      • Proof-automation
      • Macro-libraries
      • ACL2
      • Interfacing-tools
      • Hardware-verification
      • Software-verification
      • Math
      • Testing-utilities
    • Literal-evaluation

    Ubyte16-to-utf8

    UTF-8 encoding of a 16-bit Unicode code point.

    Signature
    (ubyte16-to-utf8 codepoint) → bytes
    Arguments
    codepoint — Guard (ubyte16p codepoint).
    Returns
    bytes — Type (ubyte8-listp bytes).

    The evaluation of plain string literals in Yul involves turning Unicode escapes into their UTF-8 encodings. This function does that.

    The encoding is as follows (e.g. see the Wikipedia page on UTF-8):

    • A code point between 0 and 7Fh, which consists of 7 bits abcdefg, is encoded as one byte 0abcdefg.
    • A code point between 80h and 7FFh, which consists of 8 to 11 bits abcdefghijk, is encoded as two bytes 110abcde 10fghijk.
    • A code point between 800h and FFFFh, which consists of 12 to 16 bits abcdefghijklmnop, is encoded as three bytes 1110abcd 10efghij 10klmnop.

    Definitions and Theorems

    Function: ubyte16-to-utf8

    (defun ubyte16-to-utf8 (codepoint)
      (declare (xargs :guard (ubyte16p codepoint)))
      (let ((__function__ 'ubyte16-to-utf8))
        (declare (ignorable __function__))
        (b* ((codepoint (ubyte16-fix codepoint)))
          (cond ((<= codepoint 127) (list codepoint))
                ((<= codepoint 2047)
                 (list (logior 192 (ash codepoint -6))
                       (logior 128 (logand codepoint 63))))
                ((<= codepoint 65535)
                 (list (logior 224 (ash codepoint -12))
                       (logior 128 (logand (ash codepoint -6) 63))
                       (logior 128 (logand codepoint 63))))
                (t (impossible))))))

    Theorem: ubyte8-listp-of-ubyte16-to-utf8

    (defthm ubyte8-listp-of-ubyte16-to-utf8
      (b* ((bytes (ubyte16-to-utf8 codepoint)))
        (ubyte8-listp bytes))
      :rule-classes :rewrite)

    Theorem: ubyte16-to-utf8-of-ubyte16-fix-codepoint

    (defthm ubyte16-to-utf8-of-ubyte16-fix-codepoint
      (equal (ubyte16-to-utf8 (ubyte16-fix codepoint))
             (ubyte16-to-utf8 codepoint)))

    Theorem: ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint

    (defthm ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint
      (implies (acl2::ubyte16-equiv codepoint codepoint-equiv)
               (equal (ubyte16-to-utf8 codepoint)
                      (ubyte16-to-utf8 codepoint-equiv)))
      :rule-classes :congruence)