Generating x86 code

 
    The x86 instruction set has evolved organically over the decades, so it is quite complex. One goal of this document is to explain a small of instructions that we need in this course. Another goal is to help you make the transition from generating SaM code to generating x86 code.
   
Resources:
  1. This is an excellent high-level introduction to the x86 ISA: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
  2. SASM is a cross-platform IDE for developing x86 assembly programs. Like SaM, it has an intuitive GUI  and to the extent that writing x86 assembler code can be fun, SASM makes it fun. The Windows download is completely self-contained and I was able to get it running in a couple of minutes. On Linux or Mac, installation can be more complex. See the SASM website for more information: https://dman95.github.io/SASM/english.html
  3. You can of course read Intel's ISA manuals (insert smiley face here): http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

What are the main differences between the x86 and SaM ISAs?

 

x86 assemblers

    One advantage of SaM is that there is no SaM but SaM and Pingali is its prophet. For x86 assembly code on the other hand, there are a bewildering number of different assemblers, and you will see acronyms like NASM (network assembler), MASM (Microsoft assembler), GAS (GNU assembler), AT&T syntax, Intel syntax etc. Each one is different and assembly programs produced for one assembler will usually not work with other assemblers. This means that x86 code you get from the Internet may not work with your assembler. Another issue is system calls: assembly programs that make Windows systems calls for, say, printing values will not work on Linux because system calls are different on the two operating systems. A final issue is linking with routines in libraries like libc: if you want to link to these routines, you must use the standard protocol for calling these routines.

    In this course, we will use the NASM syntax, which is supported by the SASM IDE. It is simpler than the others. The documentation for SASM says that it supports MASM and other formats but you should not use these since it complicates grading. One advantage of SASM is that it has its own routines (macros) for I/O, and these are translated by the SASM assembler into the appropriate system calls for whatever platform you are generating code for. This means you do not need to worry about system calls at least for I/O, and you get a level of portability that is convenient.

    Here is a simple SASM assembly file. The program is in the .text section. The entry point into your code must be labeled CMAIN, and it must be declared to be a global. The code calls a SASM print routine to print the 32-bit (4 byte) integer 666, and then returns.

    %include "io.inc"

    section .text

        global CMAIN

    CMAIN:
        push ebp ; set up frame base register
        mov ebp, esp

        PRINT_DEC 4, 666

        pop ebp ; restore frame base register and return

        ret

    More complicated programs will have a .global section where global variables like strings are allocated. See the file for factorial in SASM.

Generating x86 code from Bali

    One way to generate x86 code from Bali programs is to generate SaM code and then translate each SaM instruction into small sequences of x86 instructions. For example, the SaM instruction ADD can be implemented by the following x86 sequence:

         pop ebx
        pop eax
        add eax, ebx
        push eax

    Similarly, the LINK instruction in SaM can be implemented by the sequence

         push ebp
        mov ebp, esp

     This is essentially what binary translators and just-in-time (JIT) compilers do. I haven't worked out the details so I don't know if there are any hidden gotchas with this approach. Since the stacks in SaM and x86 grow in opposite directions, there may be some subtle issues with stack manipulation.

     A second way to generate x86 code is to retarget your Bali compiler so that it produces x86 code directly.  Here are the key points to keep in mind.

              code for e1 ; value left in eax
            push eax ; save value on stack since e2 will overwrite this register
            code for e2
            pop ebx;  ebx now has the value of e1
            sub ebx, eax
            mov eax, ebx