Rosetta Project Catch Up

 

You should properly set up Postgres and created a new database

Download source package here

You should download both Postgres and MySQL JDBC driver and properly set up them in your classpath

 

Milestone1

 

You can execute the following SQL to create database.

 

===============create tables SQL==================================

CREATE TABLE T_GOFunction (

            source VARCHAR ( 255 ) NOT NULL,

            function VARCHAR ( 255 ) NOT NULL,

            T_Protein_ID INTEGER NOT NULL,

            T_GOFunction_ID serial,

            CONSTRAINT PK_T_GOFunction17 PRIMARY KEY (T_GOFunction_ID)

            );

CREATE INDEX TC_T_GOFunction49 ON T_GOFunction (T_Protein_ID );

CREATE TABLE T_Match (

            fromA INTEGER NOT NULL,

            fromB INTEGER NOT NULL,

            toA INTEGER NOT NULL,

            toB INTEGER NOT NULL,

            score DOUBLE PRECISION NOT NULL,

            T_Protein_T_Protein_ID INTEGER NOT NULL,

            T_Protein_ID INTEGER NOT NULL,

            T_Match_ID serial PRIMARY KEY

            );

CREATE INDEX TC_T_Match44 ON T_Match (T_Protein_T_Protein_ID , T_Protein_ID );

CREATE TABLE T_Protein (

            accessionNumber VARCHAR ( 255 ) NOT NULL,

            name VARCHAR ( 255 ) NOT NULL,

            organism VARCHAR ( 255 ) NOT NULL,

            sequence TEXT NOT NULL,

            T_Protein_ID serial,

            T_ProteinSource_ID INTEGER NOT NULL,

            CONSTRAINT PK_T_Protein12 PRIMARY KEY (T_Protein_ID)

            );

CREATE INDEX TC_T_Protein43 ON T_Protein (T_ProteinSource_ID );

CREATE TABLE T_ProteinSource (

            dbName VARCHAR ( 255 ) NOT NULL,

            date VARCHAR ( 255 ) NOT NULL,

            T_ProteinSource_ID serial,

            CONSTRAINT PK_T_ProteinSource15 PRIMARY KEY (T_ProteinSource_ID)

            );

CREATE TABLE T_LocalAlignment (

            totalScore DOUBLE PRECISION NOT NULL,

            T_Protein_ID INTEGER NOT NULL,

            T_Protein_T_Protein_ID INTEGER NOT NULL,

            CONSTRAINT PK_T_LocalAlignment14 PRIMARY KEY (T_Protein_T_Protein_ID, T_Protein_ID)

            );

CREATE INDEX TC_T_LocalAlignment46 ON T_LocalAlignment (T_Protein_T_Protein_ID );

CREATE INDEX TC_T_LocalAlignment45 ON T_LocalAlignment (T_Protein_ID );

CREATE TABLE T_GeneFusionEvent (

            ComputedFunction SMALLINT,

            T_Protein_T_Protein_ID INTEGER NOT NULL,

            T_Protein_ID INTEGER NOT NULL,

            T_LocalAlignment_T_Protein_T_Protein_ID INTEGER NOT NULL,

            T_LocalAlignment_T_Protein_ID INTEGER NOT NULL,

            CONSTRAINT PK_T_GeneFusionEvent16 PRIMARY KEY (T_LocalAlignment_T_Protein_T_Protein_ID, T_LocalAlignment_T_Protein_ID, T_Protein_T_Protein_ID, T_Protein_ID)

            );

CREATE INDEX TC_T_GeneFusionEvent47 ON T_GeneFusionEvent (T_Protein_T_Protein_ID , T_Protein_ID );

CREATE INDEX TC_T_GeneFusionEvent48 ON T_GeneFusionEvent (T_LocalAlignment_T_Protein_T_Protein_ID , T_LocalAlignment_T_Protein_ID );

ALTER TABLE T_GeneFusionEvent ADD CONSTRAINT FK_T_GeneFusionEvent21 FOREIGN KEY (T_Protein_T_Protein_ID, T_Protein_ID) REFERENCES T_LocalAlignment (T_Protein_T_Protein_ID, T_Protein_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

ALTER TABLE T_GeneFusionEvent ADD CONSTRAINT FK_T_GeneFusionEvent22 FOREIGN KEY (T_LocalAlignment_T_Protein_T_Protein_ID, T_LocalAlignment_T_Protein_ID) REFERENCES T_LocalAlignment (T_Protein_T_Protein_ID, T_Protein_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

ALTER TABLE T_Protein ADD CONSTRAINT FK_T_Protein24 FOREIGN KEY (T_ProteinSource_ID) REFERENCES T_ProteinSource (T_ProteinSource_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

ALTER TABLE T_LocalAlignment ADD CONSTRAINT FK_T_LocalAlignment18 FOREIGN KEY (T_Protein_ID) REFERENCES T_Protein (T_Protein_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

ALTER TABLE T_LocalAlignment ADD CONSTRAINT FK_T_LocalAlignment19 FOREIGN KEY (T_Protein_T_Protein_ID) REFERENCES T_Protein (T_Protein_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

ALTER TABLE T_GOFunction ADD CONSTRAINT FK_T_GOFunction20 FOREIGN KEY (T_Protein_ID) REFERENCES T_Protein (T_Protein_ID)  ON DELETE NO ACTION ON UPDATE NO ACTION;

 

 

Milestone 2:

 

 

Modify the connection settings (i.e. host name, database name, user name, password)

 

You can first execute following to populate t_protein and t_proteinsorce table

 

java ProteinLoader bacil-gb.txt bacil Genbank 2006-5-1

java ProteinLoader ecoli-gb.txt ecoli Genbank 2006-5-1

java ProteinLoader ecoli-sp.txt ecoli Swissprot 2006-5-1

java ProteinLoader human-gb.txt human Genbank 2006-5-1

java ProteinLoader yeast-gb.txt yeast Genbank 2006-5-1

 

You then can run Milestone2,java to populate t_match and t_localalignment table.

 

 

==========================================================================================

Milestone3

 

Here is one way to do.  

 

You can first execute this query to make a symmetry t_match and t_localalignment tables.

 

insert into t_localalignment(t_protein_id, t_protein_t_protein_id, totalscore)
select t_protein_t_protein_id, t_protein_id, totalscore
from t_localalignment;
 
insert into t_match(froma, fromb, toa, tob, score, t_protein_t_protein_id, t_protein_id)
select fromb, froma, tob, toa, score, t_protein_id, t_protein_t_protein_id
from t_match;

 

Then you can execute the following query to populate the t_genefusionevent table.

 

INSERT INTO t_genefusionevent (t_protein_id, t_protein_t_protein_id,

            t_localalignment_t_protein_id, t_localalignment_t_protein_t_protein_id)

SELECT

la1.t_protein_id, la1.t_protein_t_protein_id, la2.t_protein_id, la2.t_protein_t_protein_id

FROM

t_protein p1, t_protein p2, t_localalignment la1, t_localalignment la2, t_match m1, t_match m2

WHERE la1.totalscore > 50

      AND la2.totalscore > 50

AND p1.organism = p2.organism

AND m2.t_protein_id = la2.t_protein_id

AND m2.t_protein_t_protein_id = la2.t_protein_t_protein_id

AND m1.t_protein_id = la1.t_protein_id

AND m1.t_protein_t_protein_id = la1.t_protein_t_protein_id

AND la1.t_protein_t_protein_id = la2.t_protein_t_protein_id

      AND p1.t_protein_id = la1.t_protein_id

AND p2.t_protein_id = la2.t_protein_id

AND m1.toB in

       (select Max(toB) from  t_match

         where t_protein_id = la1.t_protein_id

         and t_protein_t_protein_id = la1.t_protein_t_protein_id)

AND m2.fromB in

      (select Min(fromB) from t_match

where t_protein_id = la2.t_protein_id

and t_protein_t_protein_id = la2.t_protein_t_protein_id)

AND m1.toB<m2.fromB ;

 

========================================Milestone 4===============================

 

Milestone 4

 

First run following SQL to update the gene name in the table.

 

update t_protein set name='HIS4' where accessionnumber='gi|49065653';
update t_protein set name='TOP2' where accessionnumber='gi|1351262';
update t_protein set name='DPO3A_ECOLI' where accessionnumber='gi|118794';
update t_protein set name='DPO3E_ECOLI' where accessionnumber='gi|118805';
update t_protein set name='RFBK7_ECOLI' where accessionnumber='gi|585850';
update t_protein set name='PGM_ECOLI' where accessionnumber='gi|548495';
update t_protein set name='Q4MX95_BACCE' where accessionnumber='gi|47564317';

 

Then  you can run

 

>Java ProteinFunction

 

to populate the t_gofunction table.

 

Then you can run

 

>Java RosettaRelation

 

to see the verification results.

 

 

Then you should be all set for the Milestone 5.