Programming Assignment 5
CS 303e



Covered topics:

reading input from the user
loops
boolean conditions
String manipulation and slicing

You may not use any programming constructs or concepts that we have not covered in class.

A strand of DNA is formed from four nucleotides called adenine (abbreviated A), thymine (T), cytosine (C), and guanine (G). Genetic information is determined by the sequence of nucleotides along a strand.

In this project, you will write a program that finds the longest common nucleotide sequence in two strands of DNA. Each strand is represented by a sequence of letters from {A, T, C, G}. For example, in the strands ATGC and TGAC, the longest common sequence is TG. The two strands are not required to have the same length, and it is possible for the two strands not to have any common sequence (a sequence of length 1 does not count).

Your program will prompt the user to enter two strands of DNA. You will write out all the common longest subsequences, one line at a time. There may be 0, 1, 2 or more longest subsequences.

Sample Run:
Enter the first strand: ATGGCATAAGCTT
Enter the second strand: TGCAGCTGCATCAGGAT

Common subsequence(s):
GCAT
AGCT

Sample Run:
Enter the first strand: TAGGCAT
Enter the second strand: GAA

No common subsequence was found for TAGGCAT and GAA.

Use the coding conventions we have discussed and used in class (eg, conventions for variable names) and include whitespace, comments and indentation to make your program more readable. Write and use the following functions:

1. getStrands(): This function prompts the user for the two strands, and returns a tuple that contains the two DNA strands.

2. longestCommonSubseq(string1, string2): This function takes two DNA sequences and returns the longest subsequence of string1 and string2.

Think about how you can use the string.find() function.

Think about the efficiency of your program. If your program does unnecessary work, you will lose some points.

Your output should look like the sample output above. You will lose credit if it does not.

Programs that contain syntax errors will not receive any credit. Please plan ahead and allow plenty of time to get help from the TAs, proctors or instructor if you are having trouble with the program. Please do not email the course staff the day before the assignment is due if you need help - it is unlikely that we will respond quickly enough to assist you. Plan to come to the lab during office hours if you need help.

Save your program in a file called DNA.py. This program should include ample comments, and should use whitespace, indentation, and meaningful variable names to enhance readability. Include a header in your program as indicated in the description of project 1.

This program must be submitted by 11 pm on the due date in order to be considered on time. Please note in your program header how many slip days you used on this project, if any.

The proctors will be grading this project. IF you have any questions about grading, contact them first. If you submit this project late using slip days, you must email the proctors after your program is submitted and let them know that your project is ready to be graded.

This project must be done individually. You may talk to your classmates about solution approaches, but then you must write your own code. 


Reminders - Did you remember to:
    do this assignment by yourself?
    use meaningful variable names?
    include comments for readability?
    make sure that your program does not produce an error?
    make sure that your output matches the sample output above?
    remember to call your main() function???
    submit your program in file DNA.py using the turnin program by 11 pm on the due date?
    email the proctors after you have submitted your project, if you are using slip days?