Search-engine friendly clone of the ACL2 documentation.

Segmentation

Ea-to-la

Translate an effective address to a canonical linear address.

Signature
(ea-to-la proc-mode eff-addr seg-reg nbytes x86) → (mv flg lin-addr)
Returns: lin-addr — Type (i64p lin-addr), given (i64p eff-addr).

This translation is illustrated in Intel manual, Mar'17, Vol. 3A, Fig. 3-5, as well in AMD mamual, Oct'2013, Vol. 1, Fig. 2-3 and 2-4. In addition to the effective address, this function takes as input (the index of) a segment register, whose corresponding segment selector, with the effective address, forms the logical address that is turned into the linear address.

This translation is used: when fetching instructions, in which case the effective address is in RIP, EIP, or IP; when accessing the stack implicitly, in which case the effective address is in RSP, ESP, or SP; and when accessing data explicitly, in which case the effective address is calculated by instructions via x86-effective-addr. In the formal model, RIP contains a signed 48-bit integer, RSP contains a signed 64-bit integer, and x86-effective-addr returns a signed 64-bit integer: thus, the guard of this function requires eff-addr to be a signed 64-bit integer. In 64-bit mode, the caller of this function supplies as eff-addr the content of RIP, the content of RSP, or the result of x86-effective-addr. In 32-bit mode, the caller of this function supplies as eff-addr the unsigned 32-bit or 16-bit truncation of the content of RIP (i.e. EIP or IP), the content of RSP (i.e. ESP or SP), or the result of x86-effective-addr; the choice between 32-bit and 16-bit is determined by default address size and override prefixes.

This function also takes as input the number of bytes that have to be read or written starting from the effective address; that is, the chunk of bytes to be accessed goes from eff-addr to eff-addr + nbytes - 1, both inclusive. In 32-bit mode, the eff-addr (the start of the chunk) is checked against the lower bound of the segment, and eff-addr + nbytes - 1 (the end of the chunk) is checked against the upper bound of the segment. In 64-bit mode, these checks are skipped. It is not clear from the Intel and AMD manuals whether the calculation eff-addr + nbytes - 1 should be modular. If it were, the wrap-around would give rise to two separate chunks of bytes to be checked for containment in the segment. According to the article at https://www.embedded.com/electronics-blogs/significant-bits/4024943/Taming-the-x86-beast, the calculation does not wrap around. This is a paragraph from the article:

You also can't ``wrap around'' the end of segment like before. Earlier x86 chips with their 64KB segments allowed pointers to roll over from FFFF to 0000 automatically. Programs that accidentally (or purposely) ran off the end of a segment started over at the beginning. Programs that run off the end of a segment now are greeted with a segmentation fault.

So for now we make the stricter check by calculating eff-addr + nbytes - 1 non-modularly and therefore always having one chunk to check. Note that operations like rme-size always access one contiguous chunk because they forward the translated addresses to operations like rml-size; so if it turned out that eff-addr + nbytes - 1 can wrap around (contrary to what the aforementioned article states), changes may need to be made in rme-size and similar operations as well.

In 32-bit mode, the effective address is added to the base address of the segment; the result is truncated to 32 bits, because the addition should be modulo 2^32, given that the result must be 32 bits (cf. aforementioned figures in Intel and AMD manuals). In 64-bit mode, the addition of the base address of the segment is performed only if the segments are in registers FS and GS; the result is truncated to 64 bits, because the addition should be modulo 2^64, given that the result must be 32 bits (cf. aforementioned figures in Intel and AMD manuals); since in our model addresses are signed, we use i64 instead of n64 to perform this truncation.

If the translation is successful, this function returns a signed 64-bit integer that represents a canonical linear address. In 64-bit mode, when the segment register is not FS or GS, the effective address is returned unmodified as a linear address, because segment translation should be a no-op in this case; otherwise, the effective address is translated to a linear address. In both cases, this linear address is checked to be canonical; if not, an error flag is returned, otherwise, a canonical linear address is returned. In 32-bit mode, the 32-bit linear address that results from the translation is always canonical. If the translation fails, including the check that the linear address is canonical, a non-nil error flag is returned, which provides some information about the failure.

As explained in Intel manual, May'18, Volume 3, Sections 3.4.2 and 5.4.1, a null segment selector can be loaded into DS, ES, FS, and GS, but then a memory access through these segment registers causes a #GP. According to AMD manual, Dec'17, Volume 2, Section 4.5.1, a null segment selector has SI = TI = 0, but no explicit constraint is stated on RPL; Intel manual, May'18, Volume 2, POP specification says that a null segment selector has a value from 0 to 3, from which we infer that a null segment selector may have a non-zero RPL. In this function, we return an error if the visible portion of the segment register is 0-3, and the segment register is not CS or SS. Loading a null segment selector into CS and SS is not allowed, so it is a state invariant that CS and SS do not contain null segment selectors. The null segment selector check is skipped in 64-bit mode (see Intel manual, May'18, Volume 3, Section 5.4.1.1).

Definitions and Theorems

Function: ea-to-la$inline

(defun ea-to-la$inline (proc-mode eff-addr seg-reg nbytes x86)
 (declare (xargs :stobjs (x86)))
 (declare (type (integer 0 4) proc-mode)
          (type (signed-byte 64) eff-addr)
          (type (integer 0 5) seg-reg)
          (type (member 1 2 4 6 8 10 16 32 64)
                nbytes))
 (case proc-mode
  (0
   (b* ((lin-addr (if (or (eql seg-reg 4) (eql seg-reg 5))
                      (b* (((mv base & &)
                            (segment-base-and-bounds 0 seg-reg x86))
                           (lin-addr (i64 (+ base (n64 eff-addr)))))
                        lin-addr)
                    eff-addr))
        ((unless (canonical-address-p lin-addr))
         (mv (list :non-canonical-address lin-addr)
             0)))
     (mv nil lin-addr)))
  (1
   (b*
    (((when (and (not (= seg-reg *cs*))
                 (not (= seg-reg *ss*))
                 (< (seg-visiblei seg-reg x86) 4)))
      (mv (list :null-segment-selector seg-reg)
          0))
     ((mv (the (unsigned-byte 32) base)
          (the (unsigned-byte 33) lower-bound)
          (the (unsigned-byte 32) upper-bound))
      (segment-base-and-bounds proc-mode seg-reg x86))
     (first-addr eff-addr)
     (last-addr (+ eff-addr nbytes -1))
     ((unless (and (<= lower-bound first-addr)
                   (<= last-addr upper-bound)))
      (mv
       (list
        :segment-limit-fail (list seg-reg
                                  eff-addr lower-bound upper-bound))
       0))
     ((the (unsigned-byte 32) lin-addr)
      (n32 (+ (the (unsigned-byte 32) base)
              (the (unsigned-byte 32) eff-addr)))))
    (mv nil lin-addr)))
  (otherwise (mv (list :unimplemented-proc-mode proc-mode)
                 0))))

Theorem: i64p-of-ea-to-la.lin-addr

(defthm i64p-of-ea-to-la.lin-addr
 (implies
     (i64p eff-addr)
     (b* (((mv ?flg ?lin-addr)
           (ea-to-la$inline proc-mode eff-addr seg-reg nbytes x86)))
       (i64p lin-addr)))
 :rule-classes :rewrite)

Theorem: ea-to-la-is-i64p

(defthm ea-to-la-is-i64p
 (implies
    (i64p eff-addr)
    (signed-byte-p 64
                   (mv-nth 1
                           (ea-to-la proc-mode
                                     eff-addr seg-reg nbytes x86))))
 :rule-classes
 (:rewrite
  (:type-prescription
    :corollary
    (implies
         (i64p eff-addr)
         (integerp (mv-nth 1
                           (ea-to-la proc-mode
                                     eff-addr seg-reg nbytes x86))))
    :hints (("Goal" :in-theory '(signed-byte-p integer-range-p))))
  (:linear
   :corollary
   (implies
    (i64p eff-addr)
    (and
      (<= -9223372036854775808
          (mv-nth 1
                  (ea-to-la proc-mode eff-addr seg-reg nbytes x86)))
      (< (mv-nth 1
                 (ea-to-la proc-mode eff-addr seg-reg nbytes x86))
         9223372036854775808)))
   :hints
   (("Goal"
         :in-theory '(signed-byte-p integer-range-p (:e expt)))))))

Theorem: ea-to-la-is-i48p-when-no-error

(defthm ea-to-la-is-i48p-when-no-error
 (implies
    (not (mv-nth 0
                 (ea-to-la proc-mode eff-addr seg-reg nbytes x86)))
    (signed-byte-p 48
                   (mv-nth 1
                           (ea-to-la proc-mode
                                     eff-addr seg-reg nbytes x86))))
 :rule-classes
 (:rewrite
  (:type-prescription
   :corollary
   (implies
     (not (mv-nth 0
                  (ea-to-la proc-mode eff-addr seg-reg nbytes x86)))
     (integerp (mv-nth 1
                       (ea-to-la proc-mode
                                 eff-addr seg-reg nbytes x86))))
   :hints (("Goal" :in-theory '(signed-byte-p integer-range-p))))
  (:linear
   :corollary
   (implies
    (not (mv-nth 0
                 (ea-to-la proc-mode eff-addr seg-reg nbytes x86)))
    (and
      (<= -140737488355328
          (mv-nth 1
                  (ea-to-la proc-mode eff-addr seg-reg nbytes x86)))
      (< (mv-nth 1
                 (ea-to-la proc-mode eff-addr seg-reg nbytes x86))
         140737488355328)))
   :hints
   (("Goal"
         :in-theory '(signed-byte-p integer-range-p (:e expt)))))))

Theorem: ea-to-la-of-xw

(defthm ea-to-la-of-xw
  (implies (and (not (equal fld :msr))
                (not (equal fld :seg-hidden-base))
                (not (equal fld :seg-hidden-limit))
                (not (equal fld :seg-hidden-attr))
                (not (equal fld :seg-visible)))
           (equal (ea-to-la proc-mode eff-addr
                            seg-reg nbytes (xw fld index value x86))
                  (ea-to-la proc-mode
                            eff-addr seg-reg nbytes x86))))

Theorem: ea-to-la-when-64-bit-modep-and-not-fs/gs

(defthm ea-to-la-when-64-bit-modep-and-not-fs/gs
  (implies (and (not (equal seg-reg 4))
                (not (equal seg-reg 5)))
           (equal (ea-to-la 0 eff-addr seg-reg nbytes x86)
                  (if (canonical-address-p eff-addr)
                      (mv nil eff-addr)
                    (mv (list :non-canonical-address eff-addr)
                        0)))))