**Library Sandboxing**
*Exercise 5 for [CS 361S](.)*
*Due: Friday, March 29th, 11:59 PM*
# Goal
The goal of this exercise is to gain hands-on experience with library
sandboxing and with exploiting sandboxed libraries using confused
deputy attacks. Specifically:
- You are provided a C++ application that uses a vulnerable,
unsandboxed C library.
- Your task is to (1) retrofit the application to run the library in a
sandbox (without validating values from the sandbox), (2) turn a
vulnerability in the sandboxed library into an application exploit,
and (3) properly validate all data coming from the sandbox to
prevent such exploitation.
# The Environment
We will use the class VM you installed in [exercise 1](ex1.html).
In addition to the development tools you have already installed, we'll
need a C++ compiler and runtime libraries. As root, run `apt install
g++` (you may need to run `apt update` first).
Download the starter tarball, [ex5.tar.gz](ex5.tar.gz) and unpack
it. The tarball contains:
- *The C++ application:* The source for the application is in `app.cpp`.
**This is the only file you will modify throughout this assignment.**
- *Build files:* The included `Makefile` (which you should *not*
modify) can build the application two ways: one, as `app`, without
any sandboxing; another, as `app_sandboxed`, with sandboxing. The
second recipe will not succeed until you've added to `app.cpp` the
calls necessary to invoke library functions through RLBox.
- *The C library:* The `lib.c` and `lib.h` files (which you should not
modify) are the source code for the library. The unsandboxed
version of the application, `app`, links against these files
directly. We have also supplied a WebAssembly-sandboxed version of
the library compiled for use with the [RLBox wasm2c
sandbox](https://github.com/PLSysSec/rlbox_wasm2c_sandbox) as
`lib.wasm.a` (plus a header in `lib.wasm.h`). The sandboxed version
of the application, `app_sandboxed`, links against this version of
the library, which ensures that library functions are executed
inside a sandbox.
- *Dependencies:* The `include` directory contains the RLBox headers
necessary for compiling the application with a wasm2c sandboxing.
Do *not* modify these files.
You should not need any tools installed in your VM beyond those we
used in previous projects.
## Building the Project
Run `make app` to build the unsandboxed application. Run `make
app_sandboxed` to build the sandboxed application (this will not build
until you have sandboxed the library).
# Building, Breaking, Building Better
## Part 1: Isolating the library
We have provided you with an application that uses a contrived hashing
library. Among other things, the application directly calls the
`get_hash` library function with command line arguments, and prints
the resulting hash. The first argument is the string to hash and the
second argument is the completion message.
For example, after we build the application we can run it as such:
```
$ ./app "I love to hash strings" Completed!
VERSION: 1.0.0
Done: Completed!
Hash = 1ccaad83
```
This library is buggy though, and as an attacker you feed the
application---and, in turn, the library---bad input that will cause it
crash:
```
$ ./app AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Completed!
VERSION: 1.0.0
Done: Completed!
Segmentation fault
```
In part 1, you will modify `app.cpp` to isolate the library from the
rest of the application. This will ensure that the bug in the library
cannot directly compromise the host application. To do this, you will
be using the RLBox sandboxing tool.
We have included all the necessary header files and RLBox boilerplate
in `app.cpp`, but the app just calls library functions directly. You
will change that.
You will want to read the [RLBox documentation
page](https://rlbox.dev/) and checkout the [accompanying sandboxing
examples](https://github.com/PLSysSec/rlbox-book/tree/main/src/chapters/examples).
Note that, because of the boilerplate line
```
RLBOX_DEFINE_BASE_TYPES_FOR(lib, wasm2c);
```
in `app.cpp`, RLBox provides us a customized `rlbox_sandbox_lib`
type for the sandbox, along with a customized `tainted_lib<>` template
type for values read from the sandbox.
**Subtasks** We recommend tackling this part in several steps:
1. Declare a sandbox variable `sandbox` and call its
`.create_sandbox()` method at the beginning of `main` to initialize
the sandbox; at the end of main, call the sandbox variable's
`.destroy_sandbox()` method to destroy the sandbox.
2. Wrap all library calls using `invoke_sandbox_function`. Remember
that all inputs to sandboxed function calls must be tainted (e.g.,
the type of a tainted `char*` is `tainted_lib`) and that all
return values are tainted. You can find an example of a string
being copied into the sandbox and being used as an input [in the
RLBox hello world example
program](https://github.com/PLSysSec/rlbox-book/blob/main/src/chapters/examples/wasm-hello-example/main.cpp#L52-L61).
3. Rewrite the `on_completion` callback---the function that is called by the
library---to have type:
```
void on_completion(rlbox_sandbox_lib& _,
tainted_lib tainted_result)
```
It would be unsafe for the sandboxed library to be able to call arbitrary
functions in the application. So, you must also register this function so
the library can call it via `register_callback`.
4. The modified program will not compile, because both `main` and
`on_completion` operate on values read from the library, which are
now tainted. In part 3 of the exercise, you will add code to check
these values. For now, use the `UNSAFE_unverified()` method to get
the values without checking them.
5. Test that the valid input above still gives the same result.
6. Test that the crashing input above does not crash the application.
*Note*: Since the library is isolated it won't be able to access
pointers in the application. For example, it can't access
`argv[1]`. Instead you need to allocate a buffer in the sandbox using
`malloc_in_sandbox()` and copy values from the application into the
library.
Once you have sandboxed the application and the crashing input no
longer causes the program to segfault, copy `app.cpp` to `sol1.cpp`;
you will turn this in.
## Part 2: Using the isolated library to break the application
In part 1, you sandboxed the library so that a failure in the library
cannot directly compromise the host application. But what if the
sandbox returns bad data that then confuses the application into
compromising itself?
In this part, you goal is to find a confused deputy vulnerability in
your application code that the compromised sandboxed library can use
to compromise the host application. This is possible because in part
1 you used `UNSAFE_unverified()` to untaint data coming from the
sandbox. This function just removes the taint; it does not validate
the data received by the library.
Once you find the vulnerability, your task is to craft arguments to
the program that can be used to crash the sandboxed application. In
`sol2.txt`, you need to just give us the arguments. For example, if
`./app aaaa bbb` crashes, then `sol2.txt` contents should only be
`aaaa bbb`. (Make sure it's in this format since we'll be
automatically grading this.)
## Part 3: Isolating the library correctly
In part 3, you will improve your application by actually validating
data copied from the sandboxed library. If the library returns
erroneous data, this input validation should prevent the erroneous
data from crashing or otherwise compromising the host application.
After you have completed this portion, you will no longer use any
`UNSAFE_unverified()` functions.
In writing your validators, you should think not just about how
specific inputs can cause `lib.c` to misbehave but about what the
implicit contract between the library and the application is. Your
validator functions should make that contract explicit and enforce it.
If your validator functions encounter values that are invalid or
unsafe, you should print an error message of the form "ERROR: INVALID
_[val]_ CAUGHT", where "_[val]_" is replaced with name of the library
output being validated. This will help our grading scripts.
You will want to replace your calls to `UNSAFE_unverified()` with
`copy_and_verify` for tainted primitives (e.g., `long long`s) and
`copy_and_verify_string` for tainted string. You can find an example
of a string verifier
[here](https://github.com/PLSysSec/rlbox-book/blob/main/src/chapters/examples/wasm-hello-example/main.cpp#L81-L84).
Once you are done, copy `app.cpp` to `sol3.cpp`.
# Notes and hints
- Read the RLBox docs and example mentioned above.
- Invoking a sandbox function requires all of its arguments to be
tainted.
- You will need to change the type signature of `on_completion()`.
- It is probably not a good idea to try to interpret the huge compiler
errors that RLBox will throw at you. A better idea would be to look
at your code and try to work out what values are tainted at which
point.
- Even though the return value of `get_hash` is an integer, it still
needs to be checked. What is the range of values that
`compute_hash` seems to have been designed to output? What is the
range of values it actually *can* output? (You should be able to
come up with an `argv[1]` input to main that causes `get_hash` to
return an out-of-intended-bounds hash result.)
# Logistics
You will submit using Gradescope. You should submit a zip file of
your solution, without directory structure. Your solution should
include at least the following files:
* `sol1.cpp`: This is your first sandboxed version of `app.cpp`. The
provided crashing inputs should not cause a segfault on this (now
sandboxed) application.
* `sol2.txt`: This should contain the command-line arguments to your
now-sandboxed application (`sol1.cpp`) that causes the application
to abort. We will test it against our reference solution.
* `sol3.cpp`: This will be your improved version of `sol1.cpp`, which
addresses the confused deputy attack vector from step 2. Neither
the original vulnerability, nor the vulnerability you exploited in
part 2, should cause this version of the application either to
segfault or abort. We will test your `sol3.cpp` against our
reference solution to parts 1 and 2.
Along with each of these files, you should also submit `writeup1.txt`,
`writeup2.txt`, and `writeup3.txt`, briefly explaining how you came to
the solution for each of the 3 parts. In particular, for
writeup2.txt, explicitly explain why the arguments to the program
cause the application to segfault even though you sandboxed the
library.
# Grading
The exercise will be graded out of 10 points. These will be assigned
as follows:
* One point for security of `sol1.cpp`. We will try inputs that cause
the unsandboxed library to crash and make sure they don't trigger
segfaults in your sandboxed application. (It's okay if they trigger
aborts because of failed WebAssembly bounds checks.)
* Two points for the correctness of `sol1.cpp`. We will check that
your app behaves the same on valid inputs as the original,
unsandboxed application.
* Two points if `sol2.txt` causes our reference solution to part 1 to
crash.
* Three points for security of `sol3.cpp`. We will try inputs that
cause the library to confuse the original, unsandboxed application
or that cause the library to violate the implicit
library-application contract, and make sure that your validators
catch the erroneous values and prevent the sandboxed application
from crashing or misbehaving.
* Two points for the correctness of `sol3.cpp`. We will check that
your app still behaves the same on valid inputs as the original,
unsandboxed application, even with validators added.
# Acknowledgements
This exercise is an updated version of
[PA3](https://cseweb.ucsd.edu/~dstefan/cse127-fall21/pa/pa3.html)
from UCSD's [Fall 2021 CSE
127](https://cseweb.ucsd.edu/~dstefan/cse127-fall21/). Thanks to
Deian Stefan and the CSE 127 TAs. Greets to Shravan.