SQL Mini Data Base (MDB)

MDB is a miniature database system that supports a useful subset of SQL. The primary limitations of MDB are that only string and integer types are supported, there are no aggregations, and selection predicates must be conjunctive.

In a series of assignments, you will implement MDB. Each assignment gives you a context in which to work and the goals to achieve. In the end, there will be a contest to see which version of MDB runs the fastest on a set of benchmarks. Big brownie points are awarded to the winner. MDB supports the following DDL commands:

Create -- create a relation.
Index -- create an index
Show -- print the attributes of the given relation.

MDB supports the following DML commands:

Abort -- rollback all updates since the last abort or commit.
Close -- close a database that has no uncommitted updates. Uncommitted updates must be aborted or committed prior to a close.
Commit -- commit all updates since the last abort or commit.
Delete -- delete the tuples associated with a single-relation (i.e. non-join) predicate.
Exit -- exit MDB. This will close the currently opened database.
Insert -- insert tuple into database.
Open -- open a database for update and retrieval. Only one database can be open at any time.
Script -- run the script in the designated file.
Select -- retrieve tuples from one or more relations.
Update -- update zero or more tuples in a single relation.

The grammar for the above language is in file source/grammar.b file in the zip file mdb.zip (this is the extra code referenced below).

You will also need to place Jakarta.jar on your classpath

Project #2: Data Definition Language and Database Loading

You will be using Berkeley DB (BDB) Java Edition, a Java-based inverted file system. Your assignment is to:

First, download BDB, read its documentation, and figure out how to store and retrieve tuples of a relation. Click here for information.
Second, I have provided a parser for the MDB project. It is a set of Java files that read a command-line as input and parses the command line. If the parse is successful, the Main.main() method prints out the contents of the parse tree. To invoke it, you should type:

> java mdb.Main
> // type in your mdbsql query here
> . // terminate with a line with a sole dot

or if you want to run a script, type:

> java mdb.Main
> script "path to file";
> .

The code and an extra zip file that MDB needs is given here, along with installation instructions. Some test scripts are here, and three relations of data for loading is here. A mail-order database is in two SQL scripts here.
Third, you need to write code that walks the parse tree produced by MDB, and translate that into actions on Berkeley DB. The only actions that you need to take for this assignment is (a) to process relation create statements, (b) process insert statements to populate relations.

Note: Each production and pattern of the MDB DDL and DML grammar is identified with a Java class. The actions to execute an instance of that class is currently encoded in an execute() method of that class. Your goal is to fill in the actions of the execute methods of specific classes to do your "thing".

Note: you should define an interface that allows you to store and retrieve tuples from relations. Initially, your implementation of this interface will be very simple: you map a relation to a BDB file. There are no indices at all. Retrievals are done either via key search or scan using BDB cursors.

To walk parse trees, you'll need to read sections on Abstract Syntax Trees (ASTs) and AST Cursors to see how to traverse and harvest information.

Hints:

Do you need to check the sanity (data type integrity) of the field values in an insert command?

Ans: realize that there is an endless number of integrity checks that you could perform. Add any that you feel comfortable. It will help you debug your assignment. However, realize that scripts used for grading will not have errors.

Project #3: Data Manipulation Language and Query Processing

The goal of the second project is to provide functionality for all but the Decl_ind DDL statement. There is no query optimization: you are to process all queries as they are written, say using relation scans and nested loops. Don't worry about the presence of indices or permutation of joins, just provide MDB functionality.

My recommendation is (in this order) to:

process single-relation queries. (You'll have to segment a SELECT predicate into local predicates and join predicates (just as we reviewed in class)).
try the delete statements and update statements, that leverage single-relation queries.
process multi-relation queries.

What should be stated more clearly in the assignment is that commit and abort is something that you can postpone until the next assignment, which is about query optimization (and indices), but really also transaction support. You should plan ahead on transactions, as you'll be relying on BDB's transaction capability. Commit and abort should be straightforward. If you handle commit and abort in this project, that's fine too.

Project #4: Query Optimization

You now need to implement the DDL Index statement of MDB DDL. This will involve the modification of your code that you wrote in the last assignment. BDB has support for indices, and you should do your best to use their facilities. Further, you should direct BDB to use a particular index for query optimization. In the case of multi-table joins, you should consider optimizing queries involving 2 or 3 relations (as you can enumerate the possibilities easy enough). Another possible extension is that you add another join algorithm. You must clearly document what optimizations you have included, and performance evidence that your optimizations work.

Test scripts that you can use are:

script #1
script #2 (it has 1K tuples)
script #3 (it has 10K tuples)