CS378H: Concurrency: Honors

FPGAs!

The goal of this assignment is to use hardware level parallelism as well as platform level concurrency to solve a classic genomics problem: sequenence alignment. You will gain experience programming FPGAs in Verilog and thinking about the kind of parallelism exposed by hardware, as well as experience using heterogeneous, or accelerator-based programming using hardware specialized to particular programming tasks.

Sequence Alignment

Roughly speaking, sequence alignment refers to a class of algorithms that compare nucleotide sequences, for example to determine measures of genetic similarity. There are multiple approaches to the problem, but the one we are interested in for this lab is often referred to as an optimal matching problem, or a global alignment problem. Specifically, given two sequences of DNA sequences consisting of strings over the alphabet {A,C,G,T}, align those two strings such that edit distance is minimized. Consider the two DNA sequences below:

      
	ACGTTGCAGG
	GTTGCAGGAT

The sequences can be "aligned" in several ways, with each way yielding a similarity metric that corresponds to the number of positions at which the letters did or did not match. Concretely, at each position, we assign a score value of 1 when letters match, and -1 when they do not. A number of possible alignments along with their scores are shown below. The optimal alighnment amongst those shown is the one with highest score, or minimum edit distance, with background in green. Note that in general, there may be more than one optimal alignment for a given pair of sequences.

      
 -  A  C  G  T  T  G  C  A  G  G
 G  T  T  G  C  A  G  G  A  T  -
-1 -1 -1  1 -1 -1  1 -1  1 -1 -1

      
 -  -  A  C  G  T  T  G  C  A  G  G
 G  T  T  G  C  A  G  G  A  T  -  -
-1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1

      
 A  C  G  T  T  G  C  A  G  G  -  -  -  -
 -  -  -  -  G  T  T  G  C  A  G  G  A  T
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

      
 A  C  G  T  T  G  C  A  G  G  -  - 
 -  -  G  T  T  G  C  A  G  G  A  T
-1 -1  1  1  1  1  1  1  1  1 -1 -1

SCORE: (8*-1)+(3*1) = -5 SCORE: (11*-1)+(1*1) = -10 SCORE: (14*-1)+(0*1) = -14 SCORE: (4*-1)+(8*1) = 4

Note that the problem is enhanced (complicated) by the possibilities of insertions or deletions mid-sequence. For example, given the sequence on the left, the optimal alignment on the right shows that 'T' was inserted at (0-indexed) position 3 in the first string (or deleted from the second) and 'G' was inserted at position 7 in the second string (or deleted from the first). In keeping with the CS community's love of jargon, these insertion/deletions are called "INDELs."

ACGTTGCAGT ACGTGCGAGT

0 1 2 3 4 5 6 7 8 9 10 ------------------------------- A C G T T G C - A G T A C G - T G C G A G T 1 1 1 -1 1 1 1 -1 1 1 1

SCORE: (2*-1)+(9*1) = 7

In the most general form of the problem, it is possible to assign different weights to different pairwise combinations, for example, a mismatch of G+C contributed -5 while G+T is -10. Additionally, there are numerous extensions in which additional letters may be added to represent ambiguity when more than one kind of nucleotide could occur at a position (e.g. R, purine, can represent an ambiguous choice between G and A). Since, for this class, we are interested less in the algorithmic nuances and more in the parallelization and concurrency aspects, your implementation will use the GACT alphabet for DNA, and will use the basic scoring scheme in which a match is worth 1, while mismatches and INDELs are worth -1.

The Algorithm

The classic algorithm for global alignment relies on dynamic programming. Given two sequences over the alphabet {A, C, G, T}, the first step is to construct a table whose columns are labeled with the letters of the first input string (call it S1) and whose rows are labeled with the second. For the sample strings we used above, the initial table would look like:

A C G T T G C A G G

G

T

T

G

C

A

G

G

A

T

The cells at row and column index 1 are initialized with the negative value of the corresponding index in the string. For examples [1,1] corresponds to index 0 in both S1 and S1, and is initialized to 0. [2,1] becomes -1, [3,1] becomes -2, and so on as shown below in step 1. The goal of the algorithm is to fill in the table with scores that represent all possible alignments of the strings. At each cell of the table a "local score" is computed, which corresponds to whether the {A, C, T, G} value at the row and column headers for the cell match or not. A match gets positive value of 1, while a mismatch gets -1. For example, in the table above, at [2, 2], A is compared against G, which is a mismatch, so the local score contribution would be -1, while at [2, 7] A matches with a for a local score contribution of +1. The total value at any given cell is the minimum edit distance of: a) the local score plus the score to the upper left (corresponding to a match/mismatch) b) the score to the left plus the value of an INDEL (-1), c) the score above plus the value of an INDEL (-1). Note that a minimum edit distance is actually the maximum score taken from above or from the left corresponds to an INDEL or gap in the alignment. The table is filled in by moving down and to the right and filling in scores as the cells upon which they depend become available. For the example alignment we've been considering, below we see the first four and final steps, with the set of cells filled by each step shown in blue.

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1

T -2

T -3

G -4

C -5

A -6

G -7

G -8

A -9

T -10

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1 -1

T -2

T -3

G -4

C -5

A -6

G -7

G -8

A -9

T -10

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1 -1 -2

T -2 -2 -2

T -3

G -4

C -5

A -6

G -7

G -8

A -9

T -10

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1 -1 -2 -1

T -2 -2 -2 -2

T -3 -3 -3 -3

G -4

C -5

A -6

G -7

G -8

A -9

T -10

... ... ... ...

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1 -1 -2 -1 -2 -3 -4 -5 -6 -7 -8

T -2 -2 -2 -2 0 -1 -2 -3 -4 -5 -6

T -3 -3 -3 -3 -1 1 0 -1 -2 -3 -4

G -4 -4 -4 -2 -2 0 2 1 0 -1 -2

C -5 -5 -3 -3 -3 -1 1 3 2 1 0

A -6 -4 -4 -4 -4 -2 0 2 4 3 2

G -7 -5 -5 -3 -4 -3 -1 1 3 5 4

G -8 -6 -6 -4 -4 -4 -2 0 2 4 6

A -9 -7 -7 -5 -5 -5 -3 -1 1 3 5

T -10 -8 -8 -6 -4 -4 -4 -2 0 2 4

STEP 1
Initialized Row and Column 0 STEP 2G[2,2] = max(G[1,1] + (local score(-1)), G[1,2] + INDEL, G[2,1] + INDEL)) G[2,2] = max((0+-1), (-1+-1), (-1+-1)) G[2,2] = -1 STEP 3G[3,2] = max(G[2,1]+(local score(-1)), G[2,2]+INDEL, G[3,1]+INDEL) = max(-1+-1, -1+-1, -2+-1) = -2 G[3,3] = max(G[2,2]+(local(-1)), G[2,3]+INDEL, G[3,2]+INDEL) = max(-1+-1, -2+-1, -2+1) = -2 G[2,3] = max(G[1,2]+(local score(-1)), G[2,2]+INDEL, G[1,3]+INDEL) = max(-1+-1, -1+-1, -2+-1) = -2 STEP 4
G[4,2] = max(G[3,1]+(local score(1)), G[3,2]+INDEL, G[4,1]+INDEL) = -1 G[4,3] = max(G[4,2]+(local score(-1), G[3,2]+INDEL, G[3,3]+INDEL) = -2 G[2,4] = max(G[1,3]+(local score(-1), G[2,3]+INDEL, G[1,4]+INDEL) = -3 ...etc... ... ... ... ... FINAL STEP
G[11,2] = max(G[10,1]+(local score(1)), G[11,1]+INDEL, G[10,2]+INDEL) = -8 G[11,3] = max(G[10,2]+(local score(-1)), G[11,2]+INDEL, G[10,3]+INDEL) = -6 ...etc... G[11,11] = max(G[10,10]+(local score(-1)), G[11,10]+INDEL, G[10,11]+INDEL) = 4

Once the score table has been filled in, the optimal alignment(s) correspond to paths traced from the lower right [maxcols-1,maxrows-1] to the zero at the upper left at [1,1]. In this case, the optimal alignment yields a score of 4. The path corresponding to the optimal alignment is highlighted in gold in the following table. Recovering optimal alignments is a matter of tracing back from the lower right to the upper left. Each cell corresponds to an alignment entry pair (two letters or one letter and an INDEL), and the entry before it can be recovered by deducing which score above it, to the left, or to the upper left is optimal, and therefore contributed to the total score at that cell (there may be more than one option in the general case). For example, at [10,8] the score 5 corresponds to a match (G+G); the optimal align at the slot before it can be found by observing that the score of 4 at [9,7] must have preceded it since the slot does not correspond to an INDEL, and the 4 at [9,7] plus the value of the match at [10,8] yields the observed score. You may find it expedient in your own implementation to simply keep track of which preceding cell contributed to the score at each cell.

A C G T T G C A G G

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

G -1 -1 -2 -1 -2 -3 -4 -5 -6 -7 -8

T -2 -2 -2 -2 0 -1 -2 -3 -4 -5 -6

T -3 -3 -3 -3 -1 1 0 -1 -2 -3 -4

G -4 -4 -4 -2 -2 0 2 1 0 -1 -2

C -5 -5 -3 -3 -3 -1 1 3 2 1 0

A -6 -4 -4 -4 -4 -2 0 2 4 3 2

G -7 -5 -5 -3 -4 -3 -1 1 3 5 4

G -8 -6 -6 -4 -4 -4 -2 0 2 4 6

A -9 -7 -7 -5 -5 -5 -3 -1 1 3 5

T -10 -8 -8 -6 -4 -4 -4 -2 0 2 4

		A	C	G	T	T	G	C	A	G	G
	0	-1	-2	-3	-4	-5	-6	-7	-8	-9	-10
G	-1	-1	-2	-1	-2	-3	-4	-5	-6	-7	-8
T	-2	-2	-2	-2	0	-1	-2	-3	-4	-5	-6
T	-3	-3	-3	-3	-1	1	0	-1	-2	-3	-4
G	-4	-4	-4	-2	-2	0	2	1	0	-1	-2
C	-5	-5	-3	-3	-3	-1	1	3	2	1	0
A	-6	-4	-4	-4	-4	-2	0	2	4	3	2
G	-7	-5	-5	-3	-4	-3	-1	1	3	5	4
G	-8	-6	-6	-4	-4	-4	-2	0	2	4	6
A	-9	-7	-7	-5	-5	-5	-3	-1	1	3	5
T	-10	-8	-8	-6	-4	-4	-4	-2	0	2	4

The path shown in the table above corresponds to the optimal alignment:


 A  C  G  T  T  G  C  A  G  G  -  - 
 -  -  G  T  T  G  C  A  G  G  A  T
-1 -1  1  1  1  1  1  1  1  1 -1 -1

The Implementation

You will develop your FPGA implementation using a runtime and JIT compiler called Cascade. You will use that toolchain to develop your Verilog code, and run it either in simulation, or on an Intel/Terasic DE10-nano FPGA board. Unlike other labs where we give you latitude to select an implementation platform and language, you must code in Verilog for this lab.

Due to COVID-19 and social distancing policies, the DE10-nano boards the department would typically loan you for the duration of this lab cannot be practically distributed. For those with access to a DE10-nano, we encourage you to do the complete lab. If you do not have access to DE10 (most of you), it is expected that you will code, debug, and measure entirely in Cascade's simulation environment.

Deliverables will be detailed below, but the focus is on a writeup that provides performance measurements as graphs, and answers (perhaps speculatively) a number of questions. Spending some time setting yourself up to quickly and easily collect and visualize performance data is a worthwhile time investment as with other labs in this course.

Step 1: Create a sequential host-based solution

In step 1 of the lab, you will write a program that accepts command-line parameters to specify the following:

--S1:string of A,C,G,T
--S2:string of A,C,G,T

The output of your program should include:

The complete grid described above in CSV or TSV format.
The alignment as two strings of letters and '-' (INDEL) characters separated by a single new line. For example:
ACGTTGCAGG-- --GTTGCAGGAT
The score for the optimal alignment your implementation has selected.

For example, the following command-line invoked on moonstone.csres.utexas.edu yield corresponding CSV output in our sample solution:

./lab3 --S1 ACGTTGCAGG --S2 GTTGCAGGAT
,,A,C,G,T,T,G,C,A,G,G,
, 0 , -1 , -2 , -3 , -4 , -5 , -6 , -7 , -8 , -9 , -10 ,
G , -1 , -1 , -2 , -1 , -2 , -3 , -4 , -5 , -6 , -7 , -8 ,
T , -2 , -2 , -2 , -2 , 0 , -1 , -2 , -3 , -4 , -5 , -6 ,
T , -3 , -3 , -3 , -3 , -1 , 1 , 0 , -1 , -2 , -3 , -4 ,
G , -4 , -4 , -4 , -2 , -2 , 0 , 2 , 1 , 0 , -1 , -2 ,
C , -5 , -5 , -3 , -3 , -3 , -1 , 1 , 3 , 2 , 1 , 0 ,
A , -6 , -4 , -4 , -4 , -4 , -2 , 0 , 2 , 4 , 3 , 2 ,
G , -7 , -5 , -5 , -3 , -4 , -3 , -1 , 1 , 3 , 5 , 4 ,
G , -8 , -6 , -6 , -4 , -4 , -4 , -2 , 0 , 2 , 4 , 6 ,
A , -9 , -7 , -7 , -5 , -5 , -5 , -3 , -1 , 1 , 3 , 5 ,
T , -10 , -8 , -8 , -6 , -4 , -4 , -4 , -2 , 0 , 2 , 4 ,
ACGTTGCAGG--
--GTTGCAGGAT

Step 2: Cascade Implementation

The algorithm described above yields some very natural parallelizations for FPGAs. You will use Cascade to implement the algorithm in Verilog. Cascade's README.md provides a good overview of how to use cascade. The README.md should be considered mandatory reading whether you wish to set the system up on your own system, or use our virtual machine image Cascade.ova. In particular, if you plan to use cascade on anything other than a linux host, you will need to use the virtual machine layer, as Cascade currently runs only on a linux stack.

The instructor is able to coordinate the loaning out of a small number of DE10-nano boards, but this is discouraged due to COVID-19. If you feel strongly that you want to use one, contact the instructor.

Cascade has some very nice properties that you should find helpful for this lab. In particular it allows you to do "printf" style debugging using a "$display" keyword that is otherwise impossible with FPGA hardware. More importantly, cascade is a JIT compiler that encapsulates the programming of the actual FPGA hardware behind software emulation, allowing you to runt/test/debug your changes immediately, rather than waiting for a lengthy hardware compilation to complete. It also has features for managing inputs and outputs using file I/O, which we will rely on for this lab.

Your implementation will accept inputs in a *.mem file, and will produce as output the complete grid, also in a *.mem file. We will provide tools and skeleton code for getting inputs and outputs to/from the FPGA.

Your implementation will have the following inputs and outputs:

PARAMETER: LENGTH: the number of characters in each alignment sequence. As a simplification will assume both are of equal length.
PARAMETER: CWIDTH: the number of bits per character in each string. This should default to 2 (A,C,G,T).
PARAMETER: SWIDTH: the number of bits in the output score. This should default to 16 (a 16-bit integer).
PARAMETER: MATCH: the signed integer score for a matching pair in the alignment (default is 1).
PARAMETER: INDEL: the signed integer score for an insert/delete in the alignment (default is -1).
PARAMETER: MISMATCH: the signed integer score for a mis-matching pair in the alignment (default is -1).
input: s1: the first string to align.
input: s2: the second string to align.
output: score: the score for the best alignment.

The skeleton code we provide, main.v, constants.v, debug.v provide code that will help you create and manage inputs as well as demonstrate how to use cascade's "$display" debugging tools. The cascade-files/nw.v is where you will write your code. We strongly recommend you decompose your solution by implementing a module that describes a single cell that does comparisons for a point in the grid described above, and a top-level module that composes those cells into a grid. The Verilog excerpt below, from main.v, uses cascade's I/O support to instantiate your top-level module and populate it's inputs.

// Instantiate your top-level Needleman-Wunsch module:
wire [LENGTH*CWIDTH-1:0] s1 = rdata[2*LENGTH*CWIDTH-1:1*LENGTH*CWIDTH];
wire [LENGTH*CWIDTH-1:0] s2 = rdata[1*LENGTH*CWIDTH-1:0*LENGTH*CWIDTH];
wire signed[SWIDTH-1:0] score;
YOUR_TOP_LEVEL_MODULE#(
  .LENGTH(LENGTH),
  .CWIDTH(CWIDTH),
  .SWIDTH(SWIDTH),
  .MATCH(MATCH),
  .INDEL(INDEL),
  .MISMATCH(MISMATCH) 
) grid (
  .s1(s1),
  .s2(s2),
  .score(score)
);

The subsequent Verilog code in main.v manages the clock signal and inputs/outputs, and waits until your code has computed the score:

// While there are still inputs coming out of the fifo, print the results:
reg once = 0;
always @(posedge clock.val) begin
  // Base case: Skip first input when fifo hasn't yet reported values
  if (!once) begin 
    once <= 1;
  end 
  // Edge case: Stop running when the fifo reports empty
  else if (empty) begin
    $finish(1);
  end 
  // Common case: Print results as they become available
  else begin
    $display("align(%h,%h) = %d", s1, s2, score);
  end
end

You will implement your code in nw.v, which is included by main.v and debug.v. The debug.v file is similar to main.v, with the exception that it uses "$display" to show intermediate state of your internal FGPA logic. In your writeup, provide the following graphs and answer the following questions:

A graph showing the runtime on inputs of length 8,16,32,64 using your sequential CPU solution.
A graph showing the runtime on inputs of length 8,16,32,64 using cascade running in software emulation only mode.
A graph showing the runtime on inputs of length 8,16,32,64 using cascade running in hardware on the DE10 board.
How do you explain the differences in runtime and scalability between your CPU solution and your Verilog version?
How do you explain the differences in runtime and scalability between your Verilog version running in software?

If you are not working on a Linux desktop or laptop where you have sudo privilege (which is recommended but may not be practical), you will need to bring up the lab in a Virtual Machine. Instructions for using VirtualBox to create and use a VM with Windows, MacOS, and Linux can be found at:

Cascade Environment Instructions

For those who want to bring up their own VM and manage cascade installation on the DE10 themselves:
Windows 10-Specific VM and DE10 Bringup Instructions
MacOS-Specific VM and DE10 Bringup Instructions

If you are using the Cascade.ova VM with VirtualBox, there are a few things you need to know:

You need to have the VirtualBox 5.2.20 Oracle VM VirtualBox Extension Pack installed because the VM depends on it for USB 3.0 support.
The super-secure password is, you guessed it, password. This might be worth knowing if you let your VM go to sleep and hit the lock screen.
The Quartus tools are already installed on the VM
The cascade source tree is already part of the image. There is a folder for it on the Desktop. If you're like the instructor and can't find things without a command prompt, this means you should cd ~/Desktop/cascade to find it.
If there are fixes (god forbid) to the Cascade source tree during this lab, you should be able to update it with git pull; make clean; make;

Note that some of instructions in these files are specific to getting Cascade's JIT to work, and you don't strictly need it to complete the lab or the measurements. If you have a correct Verilog implementation, and you're able to connect to the DE-10, running the lab with --march de10 should enable hardware measurements and does not invoke JIT compilation.

Step 3: Extra Credit Options

There are three options for extra credit:

Extract the optimal alignment on the FPGA. Your solution in step 2 produced only the alignment grid, and deferred the arguably harder problem of recovering the optimal alignment from the grid to host code. The potential for INDELs in the alignment introduces variability in the length of the optimal alignment, which necessitates a state machine to recover it in hardware.
Parallel Host Solution. Your step 1 solution is sequential on the host CPU. For extra credit, you may parallelize the host version using whatever methods/languages you are inclined to use. If you do this option, your writeup should provide a detailed description of your implementation strategy, and an additional graph on inputs of length 8, 16, 32, 64, along with any additional sizes that (e.g. 256, 1024, 8192) that make it clear how scalability of your parallel CPU solution compares against your sequential CPU solution. The additional sizes are suggested because memory limits the FPGA implementation on cascade in a way that shouldn't necessarily limit your CPU solution, and you may be able to observe scaling trends on the CPU at larger sizes that you can't see at the smaller input sizes.
Better Verilog timing measurements. Chances are that your Verilog code can finish executing before Cascade can compile a bitstream to move your computation to the DE10-Nano. To get a better idea of exactly how long your program takes to execute, we've prepared another Verilog file for you: time.v. This file lets you set the desired input string size via the HLENGTH parameter and will report the exact number of cycles taken by your implementation. This number will be accurate regardless of which march flag you use. You can then regenerate your FPGA scaling graphs using cycle count as the runtime. For comparing against C++, assume your code is running on the DE10-Nano at a frequency of 50MHz and graph the expected hardware runtime. This may be a bit idealized, but should give you a good idea of how fast FPGAs can be.

Deliverables

Using the canvas turn in utility, you should turn in, along with your code, Makefiles, and measurement scripts, a brief writeup with the scalability graphs requested above. Be sure that your writeup includes sufficient text to enable us to understand which graphs are which. Note that as will other labs in this course we will check solutions for plagiarism using Moss.

One of the goals of using cascade is to aid research efforts in improving the programmability of FPGAs. To this end, cascade is instrumented to collect information about compile times and compiler errors/successes that can be used in a subsequent (anonymized!) study. Cascade will produce a file called "cascade-log" in your home directory. We hope you will include this file in your submission out of support for the "good fight" that is computer science research. However, we will also provide an additional 5 points of extra credit for anyone who turns this file in with their submission. Thanks in advance for helping the research effort!

A LaTeX template that includes placeholders for graphs and re-iterates any questions we expect answers for can be found here, (a build of that template is here).

Please report how much time you spent on the lab.

Acknowledgements

Thanks to Eric Schkufza and Michael Wei of VMware Research Group for supporting this lab. Thanks to our department head Don Fussell for supporting the project by helping find funds to enable us to loan DE10 hardware to every student.

`ACGTTGCAGT ACGTGCGAGT`	`0 1 2 3 4 5 6 7 8 9 10 ------------------------------- A C G T T G C - A G T A C G - T G C G A G T 1 1 1 -1 1 1 1 -1 1 1 1`
	SCORE: (2-1)+(91) = 7

		A	C	G	T	T	G	C	A	G	G
	0	-1	-2	-3	-4	-5	-6	-7	-8	-9	-10
G	-1	-1	-2	-1	-2	-3	-4	-5	-6	-7	-8
T	-2	-2	-2	-2	0	-1	-2	-3	-4	-5	-6
T	-3	-3	-3	-3	-1	1	0	-1	-2	-3	-4
G	-4	-4	-4	-2	-2	0	2	1	0	-1	-2
C	-5	-5	-3	-3	-3	-1	1	3	2	1	0
A	-6	-4	-4	-4	-4	-2	0	2	4	3	2
G	-7	-5	-5	-3	-4	-3	-1	1	3	5	4
G	-8	-6	-6	-4	-4	-4	-2	0	2	4	6
A	-9	-7	-7	-5	-5	-5	-3	-1	1	3	5
T	-10	-8	-8	-6	-4	-4	-4	-2	0	2	4

		A	C	G	T	T	G	C	A	G	G
	0	-1	-2	-3	-4	-5	-6	-7	-8	-9	-10
G	-1	-1	-2	-1	-2	-3	-4	-5	-6	-7	-8
T	-2	-2	-2	-2	0	-1	-2	-3	-4	-5	-6
T	-3	-3	-3	-3	-1	1	0	-1	-2	-3	-4
G	-4	-4	-4	-2	-2	0	2	1	0	-1	-2
C	-5	-5	-3	-3	-3	-1	1	3	2	1	0
A	-6	-4	-4	-4	-4	-2	0	2	4	3	2
G	-7	-5	-5	-3	-4	-3	-1	1	3	5	4
G	-8	-6	-6	-4	-4	-4	-2	0	2	4	6
A	-9	-7	-7	-5	-5	-5	-3	-1	1	3	5
T	-10	-8	-8	-6	-4	-4	-4	-2	0	2	4

		A	C	G	T	T	G	C	A	G	G
	0	-1	-2	-3	-4	-5	-6	-7	-8	-9	-10
G	-1	-1	-2	-1	-2	-3	-4	-5	-6	-7	-8
T	-2	-2	-2	-2	0	-1	-2	-3	-4	-5	-6
T	-3	-3	-3	-3	-1	1	0	-1	-2	-3	-4
G	-4	-4	-4	-2	-2	0	2	1	0	-1	-2
C	-5	-5	-3	-3	-3	-1	1	3	2	1	0
A	-6	-4	-4	-4	-4	-2	0	2	4	3	2
G	-7	-5	-5	-3	-4	-3	-1	1	3	5	4
G	-8	-6	-6	-4	-4	-4	-2	0	2	4	6
A	-9	-7	-7	-5	-5	-5	-3	-1	1	3	5
T	-10	-8	-8	-6	-4	-4	-4	-2	0	2	4