CS 378: Assignment 1 - Memory Management

Due 11:59 PM, Thursday, September 21, 2017

Introduction

In this lab, you will write the memory management code for your operating system. Memory management has two components.

The first component is a physical memory allocator for the kernel, so that the kernel can allocate memory and later free it. Your allocator will operate in units of 4096 bytes, called pages. Your task will be to maintain data structures that record which physical pages are free and which are allocated, and how many processes are sharing each allocated page. You will also write the routines to allocate and free pages of memory.

The second component of memory management is virtual memory, which maps the virtual addresses used by kernel and user software to addresses in physical memory. The amd64 hardware's memory management unit (MMU) performs the mapping when instructions use memory, consulting a set of page tables. You will modify JOS to set up the MMU's page tables according to a specification we provide.

Getting started

To fetch the source, use

git clone cs378-vijay@git.cs.utexas.edu:<groupid>-assign1

For eg, if your group-id is group1, you will have to do

git clone cs378-vijay@git.cs.utexas.edu:group1-assign1

You can commit to the repo using git commit -m '<commit-message'>

For eg, git commit -m 'Solution to assignment1'

And then push the changes. git push origin master

Make sure to commit and push your changes before the deadline. You can do multiple commits.

NOTE: If you are working on a system other than lab machine (Different Linux version or Mac), please make sure that the code you hand-in compiles and works on lab machines. We will be evaluating your code on lab machines.

What to look at in source code?

You will need to work through the following source files for assignment 1:

memlayout.h describes the layout of the virtual address space that you must implement by modifying pmap.c.

memlayout.h and pmap.h define the PageInfo structure that you'll use to keep track of which pages of physical memory are free.

kclock.c and kclock.h manipulate the PC's battery-backed clock and CMOS RAM hardware, in which the BIOS records the amount of physical memory the PC contains, among other things. The code in pmap.c needs to read this device hardware in order to figure out how much physical memory there is, but that part of the code is done for you: you do not need to know the details of how the CMOS hardware works.

Pay particular attention to memlayout.h and pmap.h, since this assignment requires you to use and understand many of the definitions they contain. You may want to reviewinc/mmu.h, too when required, as it also contains a number of definitions that will be useful for this assignment.

Part 1: Physical Page Management

The operating system must keep track of which parts of physical RAM are free and which are currently in use. JOS manages the PC's physical memory with page granularity so that it can use the MMU to map and protect each piece of allocated memory.

JOS is "told" the amount of physical memory it has by the bootloader. JOS's bootloader passes the kernel a multiboot info structure which possibly contains the physical memory map of the system. The memory map may exclude regions of memory that are in use for reasons including IO mappings for devices (e.g., the "memory hole"), space reserved for the BIOS, or physically damaged memory. For more details on how this structure looks and what it contains, refer to the specification. A typical physical memory map for a PC with 10 GB of memory looks like below.

        e820 MEMORY MAP
            address: 0x0000000000000000, length: 0x000000000009f400, type: USABLE
            address: 0x000000000009f400, length: 0x0000000000000c00, type: RESERVED
            address: 0x00000000000f0000, length: 0x0000000000010000, type: RESERVED
            address: 0x0000000000100000, length: 0x00000000dfefd000, type: USABLE
            address: 0x00000000dfffd000, length: 0x0000000000003000, type: RESERVED
            address: 0x00000000fffc0000, length: 0x0000000000040000, type: RESERVED
            address: 0x0000000100000000, length: 0x00000001a0000000, type: USABLE
    

You'll now write the physical page allocator. It keeps track of which pages are free with a linked list of struct PageInfo objects, each corresponding to a physical page. You need to write the physical page allocator before you can write the rest of the virtual memory implementation, because your page table management code will need to allocate physical memory in which to store page tables.

In the file kern/pmap.c, you must implement code for the following functions.

        boot_alloc()
	page_init()
	page_alloc()
	page_free()
	

You also need to add some code to x64_vm_init() in pmap.c, as indicated by comments there. For now, just add the code needed before the call to check_page_alloc().

You probably want to work on boot_alloc(), then x64_vm_init(), then page_init(), page_alloc(), and page_free().

check_page_alloc() tests your physical page allocator. You should boot JOS and see whether check_page_alloc() reports success. Fix your code so that it passes. You may find it helpful to add your own assert()s to verify that your assumptions are correct.

Look for comments in the parts of the JOS source that you have to modify; those comments often contain specifications and hints.

Part 2: Virtual Memory

Virtual, Linear, and Physical Addresses

In AMD64 terminology, a virtual address consists of a segment selector and an offset within the segment. A linear address is what you get after segment translation but before page translation. A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM. Be sure you understand the difference between these three types or "levels" of addresses!

           Selector  +--------------+         +-----------+
          ---------->|              |         |           |
                     | Segmentation |         |  Paging   |
Software             |              |-------->|           |---------->  RAM
            Offset   |  Mechanism   |         | Mechanism |
          ---------->|              |         |           |
                     +--------------+         +-----------+
            Virtual                   Linear                Physical

A C pointer is the "offset" component of the virtual address. In kern/bootstrap.S, we installed a Global Descriptor Table (GDT) that effectively disabled segment translation by setting all segment base addresses to 0 and limits to 0xffffffff. Hence the "selector" has no effect and the linear address always equals the offset of the virtual address.

Extra Reading Read chapters 4 and 5 of the AMD64 Architecture Programmer's Reference Manual. Read the sections about page translation and page-based protection closely (5.1). Although JOS relies most heavily on page translation, a basic understanding of how segmentation works in long mode helps to understand what's going on in JOS.

From code executing on the CPU, once we're in protected/long mode, there's no way to directly use a linear or physical address. All memory references are interpreted as virtual addresses and translated by the MMU, which means all pointers in C are virtual addresses.

The JOS kernel often needs to manipulate addresses as opaque values or as integers, without dereferencing them, for example in the physical memory allocator. Sometimes these are virtual addresses, and sometimes they are physical addresses. To help document the code, the JOS source distinguishes the two cases: the type uintptr_t represents virtual addresses, and physaddr_t represents physical addresses. Both these types are really just synonyms for 64-bit integers (uint64_t), so the compiler won't stop you from assigning one type to another! Since they are integer types (not pointers), the compiler will complain if you try to dereference them.

The JOS kernel can dereference a uintptr_t by first casting it to a pointer type. In contrast, the kernel can't sensibly dereference a physical address, since the MMU translates all memory references. If you cast a physaddr_t to a pointer and dereference it, you may be able to load and store to the resulting address (the hardware will interpret it as a virtual address), but you probably won't get the memory location you intended.

To summarize:

C typeAddress type
T*  Virtual
uintptr_t  Virtual
physaddr_t  Physical

The JOS kernel sometimes needs to read or modify memory for which it only knows the physical address. For example, adding a mapping to a page table may require allocating physical memory to store a page directory and then initializing that memory. However, the kernel, like any other software, cannot bypass virtual memory translation and thus cannot directly load and store to physical addresses. In order to translate a physical address into a virtual address that the kernel can actually read and write, You should use KADDR(pa).

The JOS kernel also sometimes needs to be able to find a physical address given the virtual address of the memory in which a kernel data structure is stored. The kernel addresses its global variables and memory that boot_alloc() allocates, with addresses in the region where the kernel was loaded, the very region where we map all of physical memory. Thus, to turn a virtual address in this region into a physical address, You should use PADDR(va).

Reference counting

There can be same physical page mapped at multiple virtual addresses simultaneously (or in the address spaces of multiple environments). You will keep a count of the number of references to each physical page in the pp_ref field of the struct PageInfo corresponding to the physical page. When this count goes to zero for a physical page, that page can be freed because it is no longer used. In general, this count should equal to the number of times the physical page appears below UTOP in all page tables (the mappings above UTOP are mostly set up at boot time by the kernel and should never be freed, so there's no need to reference count them). We'll also use it to keep track of the number of pointers we keep to the page directory pages and, in turn, of the number of references the page directories have to page table pages.

Be careful when using page_alloc. The page it returns will always have a reference count of 0, so pp_ref should be incremented as soon as you've done something with the returned page (like inserting it into a page table). Sometimes this is handled by other functions (for example, page_insert) and sometimes the function calling page_alloc must do it directly.

Page Table Management

Now you'll write a set of routines to manage page tables: to insert and remove linear-to-physical mappings, and to create page table pages when needed.

In the file kern/pmap.c, you must implement code for the following functions.

        pml4e_walk()
        pdpe_walk()
        pgdir_walk()
        boot_map_region()
        page_lookup()
        page_remove()
        page_insert()
	

page_check(), called from x64_vm_init(), tests your page table management routines. You should make sure it reports success before proceeding.

Part 3: Kernel Address Space

JOS divides the processor's linear address space into two parts. User environments (processes), will have control over the layout and contents of the lower part, while the kernel always maintains complete control over the upper part. The dividing line is defined somewhat arbitrarily by the symbol ULIM in inc/memlayout.h, reserving approximately 256MB of linear (and therefore virtual) address space for the kernel.

You'll find it helpful to refer to the JOS memory layout diagram in inc/memlayout.h both for this part.

Permissions and Fault Isolation

Since kernel and user memory are both present in each environment's address space, we will have to use permission bits in our amd64 page tables to allow user code access only to the user part of the address space. Otherwise bugs in user code might overwrite kernel data, causing a crash or more subtle malfunction; user code might also be able to steal other environments' private data.

The user environment will have no permission to any of the memory above ULIM, while the kernel will be able to read and write this memory. For the address range (UTOP,ULIM], both the kernel and the user environment have the same permission: they can read but not write this address range. This range of address is used to expose certain kernel data structures read-only to the user environment. Lastly, the address space below UTOP is for the user environment to use; the user environment will set permissions for accessing this memory.

Initializing the Kernel Address Space

Now you'll set up the address space above UTOP: the kernel part of the address space. inc/memlayout.h shows the layout you should use. You'll use the functions you just wrote to set up the appropriate linear to physical mappings.

Fill in the missing code in x64_vm_init() after the call to page_check().

Your code should now pass the check_boot_pml4e() check.

Auto Grader

There is an auto-grader in the repo named grade-assignment.pyc Check your code and make sure that the auto grader passes. Auto-grader requires python2.7. To run the auto grader, use: python2.7 grade-assignment.pyc

Before implementing the code, the auto grader fails with a panic. Be sure to remove the line that panics in the function x64_vm_init() before you test your code.