**JavaScript Engine Exploitation** *Project 3 for [CS 361S](.)* *Parts A and B due: Friday, March 8th, 11:59 PM* *Part C due: Friday, March 22nd, 11:59 PM* # Goal The goal of this project is to gain hands-on experience in JavaScript runtime exploitation. You will use a type confusion bug to build generic exploitation primitives, then use those primitives to create a rwx mapping from which to run a second stage payload. Specifically, you will target the [QuickJS](https://bellard.org/quickjs/) JavaScript engine, a simple but featureful interpreter written in C, which we have modified to add a convenient type confusion bug for you. # Background You have already read saelo's [Phrack article on browser exploitation](https://phrack.org/issues/70/3.html), which explains the use of `addrof` and `fakeobj` as generic exploitation primitives. Saelo's article uses Safari's JavaScriptCore engine as an example; likewise [0vercl0k's blog introduction to SpiderMonkey exploitation](https://doar-e.github.io/blog/2018/11/19/introduction-to-spidermonkey-exploitation/) uses Firefox's SpiderMonkey engine as an example. There are several excellent repositories of JavaScript runtime security resources, including: * Markus Gaasedelen and Amy "itszn" Burnett's [JavaScript engine exploitation reading list](https://zon8.re/posts/javascript-engine-fuzzing-and-exploitation-reading-list/); * Moritz Eckert's [Browser-Pwn](https://github.com/m1ghtym0/browser-pwn); * [awesome-browser-exploit](https://github.com/Escapingbug/awesome-browser-exploit); and * the [JavaScript engine vulnerability database](https://github.com/tunz/js-vuln-db). The details of exploitation techniques will likely be engine-specific, but many of the ideas generalize. # The Environment We will use the class VM you installed in [exercise 1](ex1.html). In [proj3.tar.gz](proj3.tar.gz) we have supplied a patched version of QuickJS that includes the type confusion bug you will use, along with other scaffolding described below. Inside the `quickjs` subdirectory, run `make` to compile QuickJS and `make qjs-debug` to build a debug version. Now, in the main project directory, `quickjs/qjs` will get you the release version and `quickjs/qjs-debug` will get you the debug version. Both should run fine under gdb, but the release version will be optimized in a way that will make inspecting some variables harder. If you give qjs a script argument, it will run that script and exit instead of launching the REPL. You can ask it to execute the script and then provide a REPL by adding the `-i` or `--interactive` command-line flag. You can run qjs under gdb. You will _definitely_ want to use [0vercl0k's trick](https://doar-e.github.io/blog/2018/11/19/introduction-to-spidermonkey-exploitation/) for setting a breakpoint in a `Math` function to trap into the debugger from the REPL. If you set a breakpoint in `cos`, then calling `Math.cos(v)` on some interesting value will trap into gdb. Go `up` one stack frame to `js_call_c_function` and the value will be available to inspect in `argv[0]`. You probably won't need to modify and rebuild QuickJS, but you will definitely want to spend time referring to `quickjs.h`, `quickjs-libc.c`, `qjs.c`, and parts of `quickjs.c` to understand how the JavaScript runtime is implemented. If you do make changes while working on the project, be sure to also test your solutions against binaries built from unmodified source before submitting. ## A note about ASLR Like other modern exploits, yours in this project will need to work in the presence of ASLR. Do not hardcode the addresses of QuickJS functions, or the offsets between functions or to PLT or GOT entries in the QuickJS binary. The layout of the text segment is different between the debug and release builds, and we expect your exploit to work on both. Do not hardcode the address of libc functions. To save you a little work in Part C, you may assume that the `mprotect_mus_offset` contains the offset in libc between the `malloc_usable_size` and `mprotect` functions (see more below), but you should not hardcode the offset between any other libc functions. Do not hardcode the address of any object on the QuickJS heap. The location of individual allocations on the heap may change between runs, for example because of allocations made in initialization. Finally, do not hardcode any stack addresses. The location of individual frames on the stack may change between runs, for example because in our testing environment variables may be different than in your testing, and the kernel reserves space for these variables at the top of the stack. # Part A In Part A, you will use a type confusion bug to create `addrof`, `fakeobj`, and `fakestr`. We have added an `Object.prototype.eight` method. Called on an object, this method tweaks its properties pointer: ``` p->prop = (JSProperty *)(0x8 ^ (uint64_t)p->prop); ``` To enable `eight()`, invoke qjs with the `--parta` command-line flag. Write a JavaScript program that uses the type confusion enabled by `eight()` and defines three functions, `addrof`, `fakeobj`, and `fakestr`. The `addrof` function should take a single object or string value and return a number corresponding to the address where the underlying object or string is stored. The `fakeobj` function should take a number and return an object value with the number argument as the object address. Similarly, the `fakestr` function should take a number and return a string value with the number argument as the string address. For any object `o`, we should have `fakeobj(addrof(o)) === o`; for any string `s`, we should have `fakestr(addrof(s)) === s`, and for any number `i` we should have `addrof(fakeobj(i)) == i` and `addrof(fakestr(i)) == i`, though running these latter expressions for an `i` that doesn't correspond to an object or string in memory may confuse QuickJS's reference-counting garbage collector and lead to segfaults. Some hints: * It is possible (and useful) to invoke `eight()` multiple times on the same object. What happens? * There is a subtlety about the `JS_VALUE_GET_TAG` macro that will make your life a lot easier than it might at first seem. In a 64-bit CPU, virtual addresses are 64 bits, whereas JavaScript numbers are implemented as double-precision floating point and so cannot exactly represent all 64-bit integer values. But our RISC-V environment uses only 48 bits of the 64-bit address space, and doubles _can_ exactly represent all 48-bit integer values. Be aware that numbers larger than $2^{32}$ need to be handled with care: JavaScript's bitwise operations first cast their operands to unsigned 32-bit integers, discarding any information in more-significant bits. (Also now is as good a time as any to mention that JavaScript has both right arithmetic shift and right logical shift operators, `>>` and `>>>`, and that `(i >>> 0).toString(16)` is the idiomatic JavaScript way to print a number `i` as an unsigned hexadecimal 32-bit value.) # Part B In Part B, you will use the generic primitives `addrof`, `fakeobj`, and `fakestr` to build the generic primitives `readmem` and `writemem`. Write a JavaScript program that uses `addrof`, `fakeobj`, and `fakestr` as described above (but no additional type confusion) and defines two functions, `readmem` and `writemem`. The `readmem` function should take a single number and return the value of the *four-byte word* in memory at the address in the argument, as an *unsigned* number between 0 and $2^{32}-1$. The `writemem` function should take two arguments, an number representing an address and a number representing a (32-bit, unsigned) value, and write the value in the second argument to memory at the *four-byte* word at the address in the first argument. The natural way to implement `readmem(addr)` and `writemem(addr, val)` is to have a `Uint32Array` object `ta` whose `u.array.u.uint32_ptr` field contains `addr`; then `readmem` is reading `ta[0]` and `writemem` is setting `ta[0] = val`. You could use `fakeobj` to create a new typed array of this sort for each `readmem` and `writemem` call, but you'll be calling those a lot in Part C, so making a fake array each time will get tedious and risks triggering the garbage collector and causing a crash. A better strategy is to have two typed arrays. One, the "data plane," will be used to read and write memory, with its `u.array.u.uint32_ptr` set to the right value before each access. The other, the "control plane," will have its `u.array.u.uint32_ptr` field point to the data plane object, so that writes to the right offset in the control plane change the data plane's `uint32_ptr` field. The layout is something like this: ************************************************************************ * * * .---------------. .---->.---------------. * * | control plane | | | data plane | * * | typed array | | | typed array | .---> memory * * | | | | | | * * | | | | | | * * | uint32_ptr o-----' | uint32_ptr o-----' * * | | | | * * '---------------' '---------------' * * * ************************************************************************ The data plane array could start out as a regular `Uint32Array`; only the control plane array needs to be faked. We recommend that your script creates both of these typed arrays once, at startup, and then your `readmem` and `writemem` implementations reuse the arrays. There is also the question of how to convincingly fake an object such as a typed array. In QuickJS, such objects have too many pointers to other data structures (for example, to the shape object we explored in class) to be conjured out of nothing. You are much better off examining a real typed array as created by the QuickJS runtime and creating a clone of it with fields changed as necessary. But how will you examine a typed array if you need to fake a typed array to create `readmem`? One answer is to use strings. QuickJS's `JSString` data structure is much simpler than its `JSObject` data structure. You should be able to fake a `JSString`. What's more, QuickJS stores string contents inline with the `JSString` header, using C's flexible array member feature (see pages 105-107 of [the draft C23 standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf)). That means that: * If you know the address of `JSString` (with the help of `addrof`, for example), you also know the address of the bytes that make up the string itself, without having to read and follow a pointer; and * If you fake a `JSString` (with the help of `fakestr`, for example) that claims to represent a long string, you can use it to read memory from the heap in locations contiguous with the string. If you get lucky and a typed array gets placed there (how would you know?), you can read that typed array's `JSObject` header. Some hints: * Read the above text carefully. It represents our suggestion for the easiest approach to a Part B solution. You don't have to follow our approach but be sure to deviate from it for a concrete reason! * Keep in mind the [`String.prototype.charCodeAt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt) and [`String.fromCharCode`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCharCode) functions, along with other convenient string routines like [`String.prototype.substring`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring) and string operators like `+`, concatenation. * QuickJS strings can be encoded using one-byte characters or two-byte wide characters (UTF-16). You want to avoid UTF-16. To escape unprintable single-byte characters, use `\x[dig][dig]`, where each `[dig]` is a hexadecimal digit. Take a look at `secondstage.js` for an example. * Don't forget that in setting string contents you can combine string constants (like `"hello"`) with strings constructed dynamically by your depending on values observed during execution. To be extra clear, the strategy recommended above calls for you to create *two* "host" strings. The first host string has a fake string header as its contents; with the help of `fakestr`, you use this fake string header to read memory that comes after the string, to look for a typed array header to copy. The second host string has a fake typed array header as its contents; with the help of `fakeobj`, you turn this fake typed array header into the control plane typed array. To allow you to work on Part B regardless of whether your Part A solution is done, we have provided builtin versions of `addrof`, `fakeobj`, and `fakestr`, as `cs361s.addrof`, `cs361s.fakeobj`, and `cs361s.fakestr`. To enable these builtins instead of `eight()`, invoke qjs with the `--partb` command-line flag. When we test your Part B solution, we will add the following preamble to your `partb.js` solution script: ``` let addrof = cs361s.addrof; let fakeobj = cs361s.fakeobj; let fakestr = cs361s.fakestr; ``` You can do the same in your testing, but make sure that these lines are _not_ part of the `partb.js` solution you submit, because that will make it harder for us to test your end-to-end solution. # Part C In Part C, you will use the generic primitives `readmem`, and `writemem` (along with `addrof`, for convenience) to create an rwx mapping, populate with with a second-stage payload, and then execute that payload. Linux will let programs change memory protections after the fact using the [`mprotect`](https://man7.org/linux/man-pages/man2/mprotect.2.html) system call. It is probably easiest to use this system call to mark a page on the heap rwx using flags `PROT_READ | PROT_WRITE | PROT_EXEC`. Either mark the page that holds the `second_stage` string rwx or copy the string's contents into the page you have set up. Then jump to the rwx copy of the second-stage payload to execute it. How you structure your Part C solution is up to you. You could choose to implement it in full return-oriented style, in which case you will want to consult the [two](https://arxiv.org/abs/2007.14995) [papers](https://arxiv.org/abs/2103.08229) on RISC-V ROP. But you may find it easier to keep more of the logic in JavaScript and arrange to misuse QuickJS mechanisms for calling C functions from JavaScript to call the libc `mprotect` wrapper. This is the approach we recommend. Some hints: * If you disassemble `js_call_c_function`, you will see that, in most cases, it invokes the C function with `a0`, `a1`, and `a2` set to `realm`, the value field of `this`, and the tag field of `this`, respectively. (But not in the `JS_CFUNC_f_f` and `JS_CFUNC_f_f_f` cases used for the math functions -- those convert their arguments to doubles and pass them in the floating point registers.) * You can either make a new, fake function object or you can hijack an existing but rarely-called function; `console.log`, which points to `js_print`, is a good choice. * Don't forget about [`Function.prototype.apply`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/apply). * To find addresses in the qjs text segment, look for a function pointer accessible from the JavaScript heap to a qjs function. There are lots of them; one example is `console.log`'s `cfunc`, which points to `js_print`. But keep in mind that the text segment is laid out differently in qjs and qjs-debug, so you should not assume that two functions are at some specific distance from each other. * To find addresses in libc, look for a function pointer accessible from the JavaScript heap to a libc function. One example is `malloc_usable_size`. There is no direct pointer to the libc function `malloc_usable_size` in the QuickJS heap, but there is a pointer to the QuickJS function `js_def_malloc_usable_size`, reachable from `ctx->rt->mf.js_malloc_usable_size`. (Where can you get a pointer to the context?) This function is just a wrapper around `malloc_usable_size`, so if you disassemble it you will quickly find a `j` or `jal` instruction (depending on whether you are targeting `qjs` or `qjs-debug`) that transfers control to the `malloc_usable_size` entry in QuickJS's [PLT](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#procedure-linkage-table). Parse the PLT stub to find the corresponding [GOT](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#global-offset-table) entry, and that entry will contain a pointer to libc itself, at least assuming the function has already been called so that the dynamic linker has had a chance to resolve it. (The best reference by far on linkers and loaders, if you're curious, is [John Levine's book](https://www.iecc.com/linker/); also see Frederic Cambus' [toolchain resource page](https://www.toolchains.net/).) * To find addresses on the stack, you could look for the `environ` symbol, stored immediately after the GOT. To allow you to work on Part C regardless of whether your Parts A and B solutions are done, we have provided builtin versions of `readmem`, `writemem`, and `addrof`, as `cs361s.readmem`, `cs361s.writemem`, and `cs361s.addrof`. To enable these builtins instead of `eight()`, invoke qjs with the `--partc` command-line flag. When we test your Part C solution, we will add the following preamble to your `partc.js` solution script: ``` let readmem = cs361s.readmem; let writemem = cs361s.writemem; let addrof = cs361s.addrof; ``` You can do the same in your testing, but make sure that these lines are _not_ part of the `partc.js` solution you submit, because that will make it harder for us to test your end-to-end solution. Our preamble will also define a string `second_stage` containing the (RISC-V instruction binary) payload to execute in the second exploit stage. You should not assume anything about this string except that it is less than 4096 bytes long, so a single rwx page is enough. Finally, our preamble will define a variable `mprotect_mus_offset` that we will set to the offset in our testing environment from the `malloc_usable_size` function in libc to the `mprotect` function in libc. If you use the strategy recommended above and this variable to find `mprotect`, your exploit should work in our autograder even if our libc ended up with a slightly different layout than the one you used in developing your exploit. You can find the right value for `mprotect_mus_offset` in your environment with the gdb command ``` p/x (uint64_t)mprotect - (uint64_t)malloc_usable_size ``` when you are debugging `qjs-debug`. # Logistics You will submit using Gradescope. You should submit a zip file of your solution, without directory structure. Your solution should include at least the following files: * `parta.js`: This is your solution to Part A. It should define functions `addrof`, `fakeobj`, and `fakestr` at global scope. * `partb.js`: This is your solution to Part B. It should define functions `readmem` and `writemem` at global scope. It should _not_ include the preamble in `preambleb.js`; we will add that ourselves when we test your Part B solution in isolation. * `partc.js`: This is your solution to Part C. It should assume that a string called `second_stage` is defined at global scope and cause the contents of that string to execute in an rwx memory region. It should _not_ include the preamble in `preamblec.js` or any definition of `second_stage`; we will add those ourselves when we test your Part C solution. Along with each of these files, you should also submit `writeupa.txt`, `writeupb.txt`, and `writeupc.txt`, briefly explaining how you came to the solution for each of the three parts. # Grading The project will be graded out of 10 points. We will grade each part in isolation, using the builtin functions implementing the underlying primitives. So, for example, we will grade your solution to Part B using the builtin implementations (`cs361s.addrof`, etc.) and the script preamble `preambleb.js` to expose them as `addrof`, etc. Each part graded in isolation is worth 3 points. We will assign the final, tenth point based on whether your solutions to all three parts work together. That is, whether you have caused the browser to execute the second-stage payload given just the `eight()` type confusion bug from Part A, without the builtins supplied in Parts B and C. We will form the end-to-end solution by concatenating your `parta.js`, `partb.js`, and `partc.js` files. We will test Part C and the end-to-end solution, at least, on both the qjs and qjs-debug binaries. Pay attention to the discussion of ASLR above. We will test Part C and the end-to-end solution at least on the encoded hello-world payload in `secondstage.js`, but we may also test Part C on other payloads. Scoring is based on functionality in our testing, though solutions that violate the spirit of the assignment (e.g., use an unrelated bug) may be docked points. # Thanks Greetz to qwertyoruiopz, Fire30, and the other participants in the [crackbykim IRC challenge](https://web.archive.org/web/20191010183356/http://rce.party/cracksbykim-quickJS.nfo) for QuickJS; and to 0vercl0k, __x86, bkth, itszn, natashenka, niklasb, saelo, tsuro, and others in the JavaScript JIT security community.