This assignment is used as a dry run on Gradescope. The grade for this assignment will not count towards the final grade for the course.


DNA Sequences ( Due 01 Sep 2023 )

DNA or deoxyribonucleic acid is a nucleic acid that contains genetic information. It is responsible for propagation of inherited traits. DNA is organized as two complementary strands that Watson and Crick called the Double Helix. Each strand is built out of nucleotides called bases of which there are four - adenine (A), thymine (T), cytosine (C), and guanine (G). The bases of the two complementary strands that make up the DNA pair up in this order: A+T, T+A, C+G, and G+C. Strands have directionality and the sequence order does matter. Genetic information is determined by the sequence of bases along the strand.

DNA has played an important role in research in computer science. For example research in string searching algorithms has been motivated by finding sequences in DNAs. For the present assignment, we are interested in finding the longest common base sequence in two DNA strands. Each strand is represented by the sequence of letters A, T, C, and G. For the two strands ACTG and TGCA the longest common sequence is TG. It is quite possible for two strands not to have any common sequence (a sequence of 1 base does not count). Also there could be two or more common sequences that have the same longest length.

Input: You will read your data from a text file called dna.in from stdin on the command line.

On Mac or Linux:

python3 DNA.py < dna.in

On Windows:

python DNA.py < dna.in
The first line of data is an integer number n that gives the number of pairs of DNA to follow. You will read one pair of DNA strings at a time. The maximum length of each string is 80 characters. Assume that each string consists only of characters 'A', 'T', 'C' and 'G'. It is acceptable if a string is in lower case or is in mixed upper and lower case. Convert both strings to upper case.

Output: Print out the longest common sequence(s) for the two strings. If there is more than one longest common sequence then print each of those sequences on separate lines in alphabetical order. Use the built-in sort function for lists. The sequences should be left aligned. Leave a blank line between the output of each input pair. There should be a blank line at the end. If there is no common sequence your program should output No Common Sequence Found.

If there are two or more largest DNA sub-strands that are identical, then output only one.

Sample output session would look like:

TCG

GGAC
TGAT

No Common Sequence Found

Your program should have a good, clean logical structure. We will be looking at good documentation and descriptive variable names. You will adhere to the standard coding conventions in Python. Your file DNA.py must have the following header:


#  File: DNA.py

#  Description:

#  Student Name:

#  Student UT EID:

#  Partner Name:

#  Partner UT EID:

#  Course Name: CS 313E

#  Unique Number: 

#  Date Created:

#  Date Last Modified:

# Input: s1 and s2 are two strings that represent strands of DNA
# Output: returns a sorted list of substrings that are the longest 
#         common subsequence. The list is empty if there are no 
#         common subsequences.
def longest_subsequence (s1, s2):

def main():
  # read the data

  # for each pair
    # call longest_subsequence

    # write out result(s)

	# insert blank line

if __name__ == "__main__":
  main()
You can always add more functions than those listed. You may only use standard libraries in Python.

For this assignment you may work with a partner. Both of you must read the paper on Pair Programming and abide by the ground rules as stated in that paper. If you are working with a partner then only one of you will submit the code. Make sure that in the header in Gradescope that you have your name and UT EID and your partner's name and UT EID. If you are working alone then you will just have your name and your UT EID.

Use the Canvas program to submit your DNA.py file. We should receive your work by 11 PM on Monday, 16 Jan 23. There will be substantial penalties if you do not adhere to the guidelines.