In this assignment, you will implement the Gamma HJOIN architecture within a single Java program using multiple threads and Java pipes exactly as in the Gamma lecture notes -- no short cuts! Short cuts will be deducted from your grade.
Warning: this is a substantial assignment. It will take you many days to finish. Do not delay in getting started. This assignment will count 2x times a typical programming assignment, so take it seriously. Brownie points will be given to those groups who finish early.
Study carefully how these examples work. Only when you understand how each box and each regression test works, should you proceed onto the main assignment.
Connector c = new Connector("A_to_B");
ReadEnd re = c.getReadEnd();
WriteEnd we = c.getWriteEnd();
Please note: there is (may still be) a bug in Java Pipes. If you serialized tuples and pushed them through pipes, your program will fail. (I have no idea why). If tuples are serialized as Java Strings, there is no problem. This level of detail has been hidden from you -- you should only invoke getNextTuple or getNextMap methods on ReadEnds, and putNextTuple or putNextMap on WriteEnds. Study the boxes whose code I give you. You'll see the level of abstraction at which you need to program.
Another point on Java pipes. When a ReadEnd is finished (meaning there is nothing more to read) -- close it! When a WriteEnd is finished (meaning there is nothing more to write) -- close it! Typical coding idioms are:ReadEnd | WriteEnd |
while(true) { | while(true) { |
I suggest two main class hierarchies. Here's the first -- every terminal class is a primitive, whose object is a thread.
The meaning of these classes is below:
Bloom-- computes a bloom filter, as in notes. Observe the icon in the middle of the class: the box has a record stream input (on the left) and a record stream output on the right, and also a bitmap output (red).
BFilter -- applies bloom filter, as in notes. The icon in the middle of the class says that this box takes a bitmap (red) and stream (black) as input and produced a record stream as output (on the right).
BloomSimulator -- to test a Bloom box, I needed to create a box that supplies a bloom filter; that's this box.
DoNothing -- relays input messages to output messages. Initially useful to see if pipelines are working.
HJoin -- this box encodes the hashjoin algorithm
Sink -- this box is useful in testing; it simply reads its inputs and does nothing with them.
ReadRelation -- this box has 2 parameters: name of file to read and the end of a pipe in which to send Tuples of the file. (I have given this box to you).
Print -- prints each record that it receives. (I have given this box to you).
PrintMap -- reads a single bit map and prints it. Useful for debugging
HSplit -- reads in a record stream and hash-splits its contents across 4 streams (numbered 0..3). (I have given this box to you).
Merge -- reads 4 streams in round-robin order to merge their results into a single output stream.
SplitM -- splits a bitmap into 4 distinct bitmaps (inverse of MergeM)
MergeM -- merges 4 distinct bitmaps into a single bitmap.
The following boxes/classes implement pipe-and-filter graphs that implement the functionality of the primitive boxes above (and thus have the same iconic "shape"). They are not part of the Thread hierarchy as only the building blocks above are implemented as threads.
These boxes implement various non-primitive implementations of the HJoin box/primitive. In the notes, one implementation map-reduced HJoin, another implementation implemented HJoin using BFilter and Bloom boxes wired together, and finally the entire Gamma graph. The terminal boxes are subclasses of Gamma.ArrayConnectors, which provides useful ways of creating arrays of pipes in one call. Not all methods of Gamma.ArrayConnectors may be needed for this project; all are fairly simple.
join client table with viewing table over key (CLIENTNO)
join orders table with odetails table over key (ONO)
join parts table with odetails table over key (PNO)
A critical part of any design is clarity and understandability. Hence, you will be graded on the clarity of your project and its ability to work correctly. Sloppy code, documentation, or anything that makes grading or understanding your program difficult will cost you points. Beware, some of these "beauty" points are subjective.
Remember: No late assignments/submissions will be accepted.