Last semester's term project write-up is accurate up to section 4, "minimum implementation requirements". Those requirements are expand in three areas as follows:
You project is to be able be easily expanded to a larger number of databases simply by compiling a file describing the additional database. Your project is simplified compared to a general purpose system in that you may define single global schema. I suggest that your starting point for that global schema be the The Flora of Texas consortiums schema definition or something from the Delta or Hispid specifications. If you find a different starting point, please tell me promptly.
Per our class discussions, I recommend that you consider resolving syntactic and semantic conflicts by exploiting the Java type system. Thus, besides information concerning connectivity, the database configuration file, much like Lex and Yacc input, will contain a syntactic definition and a semantic action.
For example you determine that the global schema includes an attribute called g_genus.
Then the configuration file might include
// Definitions for DB0 table specimen
genus string[40]; {g_genus DB0_2_g_genus()}
species string[50]; {g_speciess ....()}
Something like this might mean, here is a Java definition for a tuple, thus we may simply use regular I/O to read tuples. However, the data must promplty be transformed into the global schema representation. Inside the curly brackets is the name of a function that will convert DB0 representation of genus to the global definitions representation. g_genus tells your system that that's this attribute produces the genus. Notice the trickery of the form, g_genus is a type you defined in Java program. Though this has substance in the context of Java code, the name is very convenient for the person writing the configuration files.
A) Language definition - obviously you must have a notion of what information the rest of the system needs to complete the definition. You must turn in a BNF or similar definition and an example from a database you will be using.
B) Parsing
C) Implement and Produce Output
D) System integration
Note that there is no implementation requirement that there be a GUI for defining the configuration file.
You must support select, project and joins. The query "language" is only what a user may put in on screens you define. Thus, your query engine is does not need to be a general purpose SQL engine. However you must support concurrent access to databases.
You will have to design, one or more query input screens. These screens may include options for selecting particular databases and/or otherwise be taylored for a particular form of query. I would like to see some builtin facility for browsing the databases driven from the botanical taxonomy and/or the ability to develop and refine queries by exploiting that hierarchy. I don't have something in mind here. At least not beyond counting on you to invent a meaning for this requirement.
A) Define the queries you will support.
B) Design the user screens/forms for query input.
C) Build the query engine.
D) System Integration
A simple text based output was sufficient last year. This year some form of geographic (map) output is also required.
A) Define your various presentation options. (map overlays, texts, opportunities to follow links to additional information.)
B) Design some example screens.
C) Implement and Produce Output
D) System Integration.
11/5 Individually turn in, hardcopy or URL or both, a document for your part A milestone.
11/12 Individually turn in, hardcopy or URL or both, a document for your part B milestone.
11/12 As a group, turn in, hardcopy or URL or both, a document providing an design overview of your system.
12/10 Public demo (group) 12/13 5:00 PM, hardcopy report, one report for each student.
You can see that last year an important aspect of the report was that much of it should read as a high-level design document. The requirements for your project report will be largely unchanged.
Recall you are only implementing a term project. Consequently, your project does not include a number of key features. I would like you to consider them as you design your project and to discuss them as in issue in your final write-up. These issues include,
A user may be looking for things that grow only in Texas. The system, rather than accessing all databases, should maintain extra information about which database can safely be ignored for any one query. How would such information be developed, stored, exploited.
Since your system does not have this, you may choose to have some option where a meta query subsequently minimizes further use of uninteresting database. For example a special query form or option, select database with plants collected from Texas. Each database could be queries on Texas, one good tuple and the data stream can be aborted.
Some species may have different names and different places in a taxonomy. The system may need a synonym table to make proper sense of certain values. How might you develop a synomym table? How would you integrate it into the architecture? Can you do so without serious impact on performance?
More of these will come up.