• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
      • Kestrel-books
        • Crypto-hdwallet
        • Apt
        • Error-checking
        • Fty-extensions
        • Isar
        • Kestrel-utilities
        • Set
        • Soft
        • C
        • Bv
        • Imp-language
        • Event-macros
        • Java
        • Bitcoin
        • Ethereum
        • Yul
          • Transformations
          • Language
            • Abstract-syntax
            • Dynamic-semantics
            • Concrete-syntax
            • Static-soundness
            • Static-semantics
              • Static-safety-checking
              • Static-shadowing-checking
              • Mode-set-result
              • Literal-evaluation
                • Eval-literal
                • Eval-hex-string-literal
                • Eval-plain-string-literal
                • Ubyte16-to-utf8
                  • Eval-escape
                  • Eval-string-element
                  • Eval-hex-string-rest-element-list
                  • Eval-hex-string-rest-element
                  • Eval-hex-string-content
                  • Eval-string-element-list
                  • Eval-hex-quad
                  • Eval-hex-pair
                • Static-identifier-checking
                • Static-safety-checking-evm
                • Mode-set
                • Modes
              • Errors
            • Yul-json
          • Zcash
          • ACL2-programming-language
          • Prime-fields
          • Json
          • Syntheto
          • File-io-light
          • Cryptography
          • Number-theory
          • Lists-light
          • Axe
          • Builtins
          • Solidity
          • Helpers
          • Htclient
          • Typed-lists-light
          • Arithmetic-light
        • X86isa
        • Axe
        • Execloader
      • Math
      • Testing-utilities
    • Literal-evaluation

    Ubyte16-to-utf8

    UTF-8 encoding of a 16-bit Unicode code point.

    Signature
    (ubyte16-to-utf8 codepoint) → bytes
    Arguments
    codepoint — Guard (ubyte16p codepoint).
    Returns
    bytes — Type (ubyte8-listp bytes).

    The evaluation of plain string literals in Yul involves turning Unicode escapes into their UTF-8 encodings. This function does that.

    The encoding is as follows (e.g. see the Wikipedia page on UTF-8):

    • A code point between 0 and 7Fh, which consists of 7 bits abcdefg, is encoded as one byte 0abcdefg.
    • A code point between 80h and 7FFh, which consists of 8 to 11 bits abcdefghijk, is encoded as two bytes 110abcde 10fghijk.
    • A code point between 800h and FFFFh, which consists of 12 to 16 bits abcdefghijklmnop, is encoded as three bytes 1110abcd 10efghij 10klmnop.

    Definitions and Theorems

    Function: ubyte16-to-utf8

    (defun ubyte16-to-utf8 (codepoint)
      (declare (xargs :guard (ubyte16p codepoint)))
      (let ((__function__ 'ubyte16-to-utf8))
        (declare (ignorable __function__))
        (b* ((codepoint (ubyte16-fix codepoint)))
          (cond ((<= codepoint 127) (list codepoint))
                ((<= codepoint 2047)
                 (list (logior 192 (ash codepoint -6))
                       (logior 128 (logand codepoint 63))))
                ((<= codepoint 65535)
                 (list (logior 224 (ash codepoint -12))
                       (logior 128 (logand (ash codepoint -6) 63))
                       (logior 128 (logand codepoint 63))))
                (t (impossible))))))

    Theorem: ubyte8-listp-of-ubyte16-to-utf8

    (defthm ubyte8-listp-of-ubyte16-to-utf8
      (b* ((bytes (ubyte16-to-utf8 codepoint)))
        (ubyte8-listp bytes))
      :rule-classes :rewrite)

    Theorem: ubyte16-to-utf8-of-ubyte16-fix-codepoint

    (defthm ubyte16-to-utf8-of-ubyte16-fix-codepoint
      (equal (ubyte16-to-utf8 (ubyte16-fix codepoint))
             (ubyte16-to-utf8 codepoint)))

    Theorem: ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint

    (defthm ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint
      (implies (acl2::ubyte16-equiv codepoint codepoint-equiv)
               (equal (ubyte16-to-utf8 codepoint)
                      (ubyte16-to-utf8 codepoint-equiv)))
      :rule-classes :congruence)