DNA Sequences ( Due 08 Feb 2008 )

DNA or deoxyribonucleic acid is a nucleic acid that contains genetic information. It is responsible for propagation of inherited traits. DNA is organized as two complementary strands that Watson and Crick called the Double Helix. Each strand is built out of nucleotides called bases of which there are four - adenine (A), thymine (T), cytosine (C), and guanine (G). The bases of the two complementary strands that make up the DNA pair up in this order: A+T, T+A, C+G, and G+C. Strands have directionality and the sequence order does matter. Genetic information is determined by the sequence of bases along the strand.

DNA has played an important role in research in computer science. For example research in string searching algorithms has been motivated by finding sequences in DNAs. For the present assignment, we are interested in finding the longest common base sequence in two DNA strands. Each strand is represented by the sequence of letters A, T, C, and G. For the two strands ACTG and TGCA the longest common sequence is TG. It is quite possible for two strands not to have any common sequence (a sequence of 1 base does not count). Also there could be two or more common sequences that have the same longest length.

Your program will accept as input two strings representing the two DNA strands. The maximum length of each string is 100 characters. You will check that each string consists only of the characters 'A', 'T', 'C', and 'G'. It is acceptable if the input is in lower case or is mixed upper and lower case. If the length of the strings is not in the range 1 and 100 (inclusive) or if either of them have illegal characters, then write out an error message and exit the program using the return statement instead of System.exit(0).

if (errorCond)
{
  System.out.println ("Error Message");
  return;
}

Convert both strings to upper case. Print out the longest common sequence(s) for the two strings. If there are multiple longest common sequences of the same length print all of them out, one to a line. If there is no common sequence your program should output No common sequence found.

Sample output session would look like:

Enter 1st DNA strand: ACTG
Enter 2nd DNA strand: TGCA

Common sequence(s): TG

For this assignment you may or may not work with another student in the class. If you want a partner but have not found one, go to the discussion group on Blackboard and post your contact information. Use any class available in the standard Java library for your solution. The file that you will be turning in will be called DNASequence.java. The file will have a header of the following form:

/*
  File: DNASequence.java

  Description:

  Student Name:

  Student UT EID:

  Partner's Name:

  Partner's UT EID:

  Course Name: CS 313E

  Unique Number (55490/55495):

  Date Created:

  Date Last Modified:

*/

You will follow the standard Java Coding Conventions. You can either view the HTML page or download the PDF or Postscript and print it out. There is a modification that I would like to make to the standard coding conventions. Please align the opening and closing braces vertically so that you can easily make out the blocks of code. For example:

Do this:
if ( x > 5 )
{
  a = b + c;
}

Not this:
if ( x > 5 ) {
  a = b + c;
}

Use the turnin program to submit your DNASequence.java file. The TAs should receive your work by 5 PM on Friday, 08 February 2008. There will be substantial penalties if you do not adhere to the guidelines. The TA in charge of this assignment is Adam Wilkinson (a.wilkinson@mail.utexas.edu)