SQL Mini Data Base (MDB)

MDB is a miniature database system that supports a useful subset of SQL.  The primary limitations of MDB are that only string and integer types are supported, there are no aggregations, and selection predicates must be conjunctive.

In a series of assignments, you will implement MDB.  Each assignment gives you a context in which to work and the goals to achieve.   In the end, there will be a contest to see which version of MDB runs the fastest on a set of benchmarks.  Big brownie points are awarded to the winner. MDB supports the following DDL commands:

MDB supports the following DML commands:

The grammar for the above language is in file source/grammar.b file in the zip file mdb.zip (this is the extra code referenced below).

Project #2: Data Definition Language and Database Loading

You will be using Berkeley DB (BDB) Java Edition, a Java-based inverted file system.  Your assignment is to:

Note: you should define an interface that allows you to store and retrieve tuples from relations.  Initially, your implementation of this interface will be very simple: you map a relation to a BDB file.  There are no indices at all.  Retrievals are done either via key search or scan using BDB cursors.

To walk parse trees, you'll need to read sections on Abstract Syntax Trees (ASTs) and AST Cursors to see how to traverse and harvest information.

Hints:

Project #3: Data Manipulation Language and Query Processing

The goal of the second project is to provide functionality for all but the Decl_ind DDL statement. There is no query optimization: you are to process all queries as they are written, say using relation scans and nested loops. Don't worry about the presence of indices or permutation of joins, just provide MDB functionality.

My recommendation is (in this order) to:

  1. process single-relation queries. (You'll have to segment a SELECT predicate into local predicates and join predicates (just as we reviewed in class)).
  2. try the delete statements and update statements, that leverage single-relation queries.
  3. process multi-relation queries.

What should be stated more clearly in the assignment is that commit and abort is something that you can postpone until the next assignment, which is about query optimization (and indices), but really also transaction support. You should plan ahead on transactions, as you'll be relying on BDB's  transaction capability. Commit and abort should be straightforward. If you handle commit and abort in this project, that's fine too.

Project #4: Query Optimization

You now need to implement the DDL Index statement of MDB DDL.  This will involve the modification of your code that you wrote in the last assignment.  BDB has support for indices, and you should do your best to use their facilities. Further, you should direct BDB to use a particular index for query optimization.  In the case of multi-table joins, you should consider optimizing queries involving 2 or 3 relations (as you can enumerate the possibilities easy enough).  Another possible extension is that you add another join algorithm.  You must clearly document what optimizations you have included, and performance evidence that your optimizations work.

Test scripts that you can use are: