CS303E Assignment 8

Due Wednesday, April 17th by 11pm

Warning: This assignment is algorithmically difficult. Start early, and be certain to create an algorithm before you begin programming. As a reminder, you can get help in office hours.

A strand of DNA is formed from four nucleotides called adenine (abbreviated A), thymine (T), cytosine (C), and guanine (G). Genetic information is determined by the sequence of nucleotides along a strand.

For this assignment, you will write a program that finds the longest common nucleotide sequence in two strands of DNA. Each strand is represented by a sequence of letters from {A, T, C, G}. For example, in the strands ATGC and TGAC, the longest common sequence is TG. The two strands are not required to have the same length, and it is possible for the two strands not to have any common sequence (a sequence of length 1 does not count).

You may work in pairs for this assignment, but you must follow the guidelines covered in Pair Programming Guidelines. If you work in a pair, you will receive two extra credit points on your grade for this assignment. You must partner with someone in your lecture. Also, you must partner with someone that you have not partnered with previously this semester.

Note: You may not use any programming construct that we have not covered in class.

File Name: DNA.py (Note the all caps on DNA!)

Your program will prompt the user to enter two strands of DNA. You will write out all the common longest subsequences, one line at a time. There may be 0, 1, 2, or more longest subsequences. It should do this using the getStrands(), longestCommonSubseq(), and printCommonSubseqs() functions, which should all be called from your main() function. Your program should continue prompting the user until the user enters the empty string. (To enter the empty string, just press return.)

getStrands() is a function written by you that prompts the user for two strands and returns a tuple containing the two DNA strands.

longestCommonSubseq() is a function written by you that accepts two DNA sequences and returns the length of the longest common subsequence of the two strands.

printCommonSubseqs() is a function written by you that accepts three parameters: 2 DNA strands and a length, in that order. The function finds all common subsequences of the specified length and prints them to the screen.

Here is a sample run of the program with the user's input in green:

 
Please enter a strand of DNA: ATGGCATAAGCTT
Please enter another strand of DNA: TGCAGCTGCATCAGGAT
Common subsequence(s):
AGCT
GCAT

Please enter a strand of DNA: TAGGCAT
Please enter another strand of DNA: GAA
No common subsequence was found for TAGGCAT and GAA

Please enter a strand of DNA: AAAAAaaaaaAAAAAAAAaaaaa
Please enter another strand of DNA: aaaaaaaaaccaaaaaaaa

Common subsequence(s):
AAAAAAAAA

Please enter a strand of DNA: AAGGTTCCAAGGTTCC
Please enter another strand of DNA: CCTTGGAACCTTGGAA
Common subsequence(s):
AA
GG
TT
CC
AA
GG
TT
CC

Please enter a strand of DNA: AGTCAGTCAGTCAGTC
Please enter another strand of DNA: TCGATCGA
Common subsequence(s):
TC
TC

Please enter a strand of DNA: AGTCAGTC

Please enter another strand of DNA: CTGACTGACTGACTGA
No common subsequence was found for AGTCAGTC and CTGACTGACTGACTGA

Please enter a strand of DNA: 
Please enter another strand of DNA: 
Ending program

Your subsequences may print in any order, but the rest of your output should match this sample output. You may assume the user enters strings with only A, C, G, and T (upper or lower case) unless they enter a string of length 0 (also known as an empty string).

Test your program by running it several times. Put your test runs with at least 5 pairs of sequences and their output in comments at the end of your .py file

Name your file DNA.py. Be certain to begin your file with the following header:


# File: --name of file--
# Description: --a description of your program--
# Assignment Number:
#
# Name: --your name--
# EID: --your eid--
# Course Name: CS303E
#
# Unique Number: --your section number--
#
# Date created:
# Date last modified:
#
# Slip days used this assignment:
# Total slip days used:
If you used pair programming, include name, EID, course name, and unique number for both partners along with the other required information.

Use the turnin program to submit your file. The file must be turned in by April 17th at 11pm. If you use slip days, please notify your TA when you turn in your file. Your program will be graded according to the following general grading criteria:
Correctness: Does the program pass the provided and additional test cases? Take off points for failures.
Testing: Did the students add more test cases?
Documentation: Are the methods and complicated code segments documented (especially the complicated code)? Is the header correctly filled in?
Design/Efficiency: Program should not contain huge, convoluted methods that should be broken up into other methods. The algorithm is not trivial, and students often use huge methods that go on for 50 lines or more. They need to modularize the code.

If you used pair programming, both partners must have enough slip days left to cover any slip days you use. (e.g., If you use two, both partners must have at least two left to use.) Turn the assignment in using just one partner's turnin account. The grader will grade it and enter the grade for both partners.

Did you remember to: