**Library Sandboxing** *Exercise 5 for [CS 361S](.)* *Due: Friday, March 29th, 11:59 PM* # Goal The goal of this exercise is to gain hands-on experience with library sandboxing and with exploiting sandboxed libraries using confused deputy attacks. Specifically: - You are provided a C++ application that uses a vulnerable, unsandboxed C library. - Your task is to (1) retrofit the application to run the library in a sandbox (without validating values from the sandbox), (2) turn a vulnerability in the sandboxed library into an application exploit, and (3) properly validate all data coming from the sandbox to prevent such exploitation. # The Environment We will use the class VM you installed in [exercise 1](ex1.html). In addition to the development tools you have already installed, we'll need a C++ compiler and runtime libraries. As root, run `apt install g++` (you may need to run `apt update` first). Download the starter tarball, [ex5.tar.gz](ex5.tar.gz) and unpack it. The tarball contains: - *The C++ application:* The source for the application is in `app.cpp`. **This is the only file you will modify throughout this assignment.** - *Build files:* The included `Makefile` (which you should *not* modify) can build the application two ways: one, as `app`, without any sandboxing; another, as `app_sandboxed`, with sandboxing. The second recipe will not succeed until you've added to `app.cpp` the calls necessary to invoke library functions through RLBox. - *The C library:* The `lib.c` and `lib.h` files (which you should not modify) are the source code for the library. The unsandboxed version of the application, `app`, links against these files directly. We have also supplied a WebAssembly-sandboxed version of the library compiled for use with the [RLBox wasm2c sandbox](https://github.com/PLSysSec/rlbox_wasm2c_sandbox) as `lib.wasm.a` (plus a header in `lib.wasm.h`). The sandboxed version of the application, `app_sandboxed`, links against this version of the library, which ensures that library functions are executed inside a sandbox. - *Dependencies:* The `include` directory contains the RLBox headers necessary for compiling the application with a wasm2c sandboxing. Do *not* modify these files. You should not need any tools installed in your VM beyond those we used in previous projects. ## Building the Project Run `make app` to build the unsandboxed application. Run `make app_sandboxed` to build the sandboxed application (this will not build until you have sandboxed the library). # Building, Breaking, Building Better ## Part 1: Isolating the library We have provided you with an application that uses a contrived hashing library. Among other things, the application directly calls the `get_hash` library function with command line arguments, and prints the resulting hash. The first argument is the string to hash and the second argument is the completion message. For example, after we build the application we can run it as such: ``` $ ./app "I love to hash strings" Completed! VERSION: 1.0.0 Done: Completed! Hash = 1ccaad83 ``` This library is buggy though, and as an attacker you feed the application---and, in turn, the library---bad input that will cause it crash: ``` $ ./app AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Completed! VERSION: 1.0.0 Done: Completed! Segmentation fault ``` In part 1, you will modify `app.cpp` to isolate the library from the rest of the application. This will ensure that the bug in the library cannot directly compromise the host application. To do this, you will be using the RLBox sandboxing tool. We have included all the necessary header files and RLBox boilerplate in `app.cpp`, but the app just calls library functions directly. You will change that. You will want to read the [RLBox documentation page](https://rlbox.dev/) and checkout the [accompanying sandboxing examples](https://github.com/PLSysSec/rlbox-book/tree/main/src/chapters/examples). Note that, because of the boilerplate line ``` RLBOX_DEFINE_BASE_TYPES_FOR(lib, wasm2c); ``` in `app.cpp`, RLBox provides us a customized `rlbox_sandbox_lib` type for the sandbox, along with a customized `tainted_lib<>` template type for values read from the sandbox. **Subtasks** We recommend tackling this part in several steps: 1. Declare a sandbox variable `sandbox` and call its `.create_sandbox()` method at the beginning of `main` to initialize the sandbox; at the end of main, call the sandbox variable's `.destroy_sandbox()` method to destroy the sandbox. 2. Wrap all library calls using `invoke_sandbox_function`. Remember that all inputs to sandboxed function calls must be tainted (e.g., the type of a tainted `char*` is `tainted_lib`) and that all return values are tainted. You can find an example of a string being copied into the sandbox and being used as an input [in the RLBox hello world example program](https://github.com/PLSysSec/rlbox-book/blob/main/src/chapters/examples/wasm-hello-example/main.cpp#L52-L61). 3. Rewrite the `on_completion` callback---the function that is called by the library---to have type: ``` void on_completion(rlbox_sandbox_lib& _, tainted_lib tainted_result) ``` It would be unsafe for the sandboxed library to be able to call arbitrary functions in the application. So, you must also register this function so the library can call it via `register_callback`. 4. The modified program will not compile, because both `main` and `on_completion` operate on values read from the library, which are now tainted. In part 3 of the exercise, you will add code to check these values. For now, use the `UNSAFE_unverified()` method to get the values without checking them. 5. Test that the valid input above still gives the same result. 6. Test that the crashing input above does not crash the application. *Note*: Since the library is isolated it won't be able to access pointers in the application. For example, it can't access `argv[1]`. Instead you need to allocate a buffer in the sandbox using `malloc_in_sandbox()` and copy values from the application into the library. Once you have sandboxed the application and the crashing input no longer causes the program to segfault, copy `app.cpp` to `sol1.cpp`; you will turn this in. ## Part 2: Using the isolated library to break the application In part 1, you sandboxed the library so that a failure in the library cannot directly compromise the host application. But what if the sandbox returns bad data that then confuses the application into compromising itself? In this part, you goal is to find a confused deputy vulnerability in your application code that the compromised sandboxed library can use to compromise the host application. This is possible because in part 1 you used `UNSAFE_unverified()` to untaint data coming from the sandbox. This function just removes the taint; it does not validate the data received by the library. Once you find the vulnerability, your task is to craft arguments to the program that can be used to crash the sandboxed application. In `sol2.txt`, you need to just give us the arguments. For example, if `./app aaaa bbb` crashes, then `sol2.txt` contents should only be `aaaa bbb`. (Make sure it's in this format since we'll be automatically grading this.) ## Part 3: Isolating the library correctly In part 3, you will improve your application by actually validating data copied from the sandboxed library. If the library returns erroneous data, this input validation should prevent the erroneous data from crashing or otherwise compromising the host application. After you have completed this portion, you will no longer use any `UNSAFE_unverified()` functions. In writing your validators, you should think not just about how specific inputs can cause `lib.c` to misbehave but about what the implicit contract between the library and the application is. Your validator functions should make that contract explicit and enforce it. If your validator functions encounter values that are invalid or unsafe, you should print an error message of the form "ERROR: INVALID _[val]_ CAUGHT", where "_[val]_" is replaced with name of the library output being validated. This will help our grading scripts. You will want to replace your calls to `UNSAFE_unverified()` with `copy_and_verify` for tainted primitives (e.g., `long long`s) and `copy_and_verify_string` for tainted string. You can find an example of a string verifier [here](https://github.com/PLSysSec/rlbox-book/blob/main/src/chapters/examples/wasm-hello-example/main.cpp#L81-L84). Once you are done, copy `app.cpp` to `sol3.cpp`. # Notes and hints - Read the RLBox docs and example mentioned above. - Invoking a sandbox function requires all of its arguments to be tainted. - You will need to change the type signature of `on_completion()`. - It is probably not a good idea to try to interpret the huge compiler errors that RLBox will throw at you. A better idea would be to look at your code and try to work out what values are tainted at which point. - Even though the return value of `get_hash` is an integer, it still needs to be checked. What is the range of values that `compute_hash` seems to have been designed to output? What is the range of values it actually *can* output? (You should be able to come up with an `argv[1]` input to main that causes `get_hash` to return an out-of-intended-bounds hash result.) # Logistics You will submit using Gradescope. You should submit a zip file of your solution, without directory structure. Your solution should include at least the following files: * `sol1.cpp`: This is your first sandboxed version of `app.cpp`. The provided crashing inputs should not cause a segfault on this (now sandboxed) application. * `sol2.txt`: This should contain the command-line arguments to your now-sandboxed application (`sol1.cpp`) that causes the application to abort. We will test it against our reference solution. * `sol3.cpp`: This will be your improved version of `sol1.cpp`, which addresses the confused deputy attack vector from step 2. Neither the original vulnerability, nor the vulnerability you exploited in part 2, should cause this version of the application either to segfault or abort. We will test your `sol3.cpp` against our reference solution to parts 1 and 2. Along with each of these files, you should also submit `writeup1.txt`, `writeup2.txt`, and `writeup3.txt`, briefly explaining how you came to the solution for each of the 3 parts. In particular, for writeup2.txt, explicitly explain why the arguments to the program cause the application to segfault even though you sandboxed the library. # Grading The exercise will be graded out of 10 points. These will be assigned as follows: * One point for security of `sol1.cpp`. We will try inputs that cause the unsandboxed library to crash and make sure they don't trigger segfaults in your sandboxed application. (It's okay if they trigger aborts because of failed WebAssembly bounds checks.) * Two points for the correctness of `sol1.cpp`. We will check that your app behaves the same on valid inputs as the original, unsandboxed application. * Two points if `sol2.txt` causes our reference solution to part 1 to crash. * Three points for security of `sol3.cpp`. We will try inputs that cause the library to confuse the original, unsandboxed application or that cause the library to violate the implicit library-application contract, and make sure that your validators catch the erroneous values and prevent the sandboxed application from crashing or misbehaving. * Two points for the correctness of `sol3.cpp`. We will check that your app still behaves the same on valid inputs as the original, unsandboxed application, even with validators added. # Acknowledgements This exercise is an updated version of [PA3](https://cseweb.ucsd.edu/~dstefan/cse127-fall21/pa/pa3.html) from UCSD's [Fall 2021 CSE 127](https://cseweb.ucsd.edu/~dstefan/cse127-fall21/). Thanks to Deian Stefan and the CSE 127 TAs. Greets to Shravan.