Longest Common Subsequence (Due 22 July 2011)

Each strand of DNA is built out of four nucleotides (or bases) called adenine (A), thymine (T), cytosine (C), and guanine (G). Genetic information is determined by the sequence of bases along the strand.

This program will find the longest common base sequence in two strands of DNA. Each strand is represented by the sequence of letters A, T, C, and G. For example, in the two strands ATGC and TGAC the longest common sequence is TG. The two strands need not have the same length. It is quite possible for the two strands not to have any common sequence (a sequence of 1 base does not count).

Input: Prompt the user to enter two strands of DNA one strand at a time.

Output: You will write out all the common longest subsequences, one line at a time. If you do not find any common sequence write No Common Sequence Found.

Here is what a typical session would look like:

Enter first strand: ATGGCATAAGCTT
Enter second strand: TGCAGCTGCATCAGGAT

Common Subsequence(s):
GCAT
AGCT

The program that you will be writing will be called DNA. Generate all the substrings of the shorter DNA strand starting with substrings of the largest length. Use the find() function to determine if that substring exists in the longer strand of DNA. We will be looking at good documentation, and adherence to the coding convention discussed in class. Your file DNA.py will have the following header:


#  File: DNA.py

#  Description:

#  Student Name:

#  Student UT EID:

#  Course Name: CS 303E

#  Unique Number: 

#  Date Created:

#  Date Last Modified:

Use the turnin program to submit your DNA.py file. The TAs should receive your work by 11 PM on Friday, 22 July 2011. There will be substantial penalties if you do not adhere to the guidelines.