CS 395T Architecture and Applications of Biological Databases

Spring 2004

Reading List for Student Presentations


NEW : Presentations on April 20

1. Tudor I. Oprea, Ismael Zamora, and Anna-Lena Ungell, Pharmacokinetically Based Mapping Device for Chemical Space NavigationJ. Comb. Chem. 2002, 4, 258-266.(Bongjune Kwon)

2.Gösta Grahne, Raul Hakli, Matti Nykänen, Hellis Tamm and Esko Ukkonen: ''Design and Implementation of a String Database Query Language''. To appear as Information Systems 28(4), pages 311-337, 2003.(Ving Lei)


  1. MoBIoS
    1. Daniel Miranker, Weijia Xu, Rui Mao. "MoBIoS: A Metric-Space DBMS to Support Biological Discovery", 2003. (long version of SSDBM03)
    2. W Weijia Xu, Daniel P. Miranker. "A Metric Model of Amino Acid Substitution", (to appear J. of Bioinformatics) 2004
    3. Weijia Xu, Daniel P. Miranker, Rui Mao, Shu Wang. "Indexing Protein Sequences in Metric Space", 2003.
    4. Daniel P. Miranker, Wenguo Liu, Weijia Xu, Rui Mao. "Sequence View: A Database Mechanism for Biosequences," 2003.
    5. Weijia Xu, Willard J Briggs, Joanna Padolina, Wenguo Liu, C. Randal Linder, Daniel P. Miranker. "Using MoBIoS' Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genomes," 2004.

  2. Metric-Space Indexing

    1. S. Brin. Near neighbor search in large metric spaces. In Proc. VLDB'95, pages 574--584, 1995.
    2. E. Chavez, G. Navarro, J. L Marroquin,   "Searching in Metric Spaces", ACM Computing Surveys, 2001.

  3. Biological Information Retreival, Sequence and/or Structure

  4.  
    1. H.E. Williams and J. Zobel,. Indexing and Retrieval for Genomic Databases , Knowledge and Data Engineering, 14(1), 63--78, 2002.
    2. [ pdf]
    3. Arnab Bhattacharya, Tolga Can, Tamer Kahveci, Ambuj K. Singh, Yuan-Fang Wang,  ProGreSS: Simultaneous Searching of Protein Databases by Sequence and Structure, PSB 2004. [ps] [pdf]
    4. Orhan Camoglu, Tamer Kahveci, Ambuj K. Singh,  Towards Index-based Similarity Search for Protein Structure Databases, CSB 2003 [ps] [pdf]
    5. Eran Halperin, Jeremy Buhler,Richard Karp, Robert Krauthgamer and Ben Westover, Detecting Protein Sequences Via Metric Embeddings. Proceedings of the Eleventh International Conference on  Intelligent Systems for  Molecular Biology (ISMB 2003) 122-199, Brisbane, Australia,  2003.

  5. Biological Data Models
    1. Karp, P.D., Pathway Databases: A Case Study in Computational Symbolic Theories, Science, vol. 293, pp. 2040-2044, 2001.[html]
    2. Luay Nakhleh, Daniel Miranker, Francois Barbancon, William H. Piel, Michael Donoghue. "Requirements of Phylogenetic Databases," Bibe, 2003. (Cara Stockham, 03/02/04)
    3. Brazma et.al. Minimum information about a microarray experiment (MIAME) - toward standards for microarray data.Genome Biology 2002,  3:research0046.1-0046.9 [link]
    4. Paul T Spellman et.al, Design and implementation of microarray gene expression markup language, (MAGE-ML), Genome Biology 2002, 3(9) (published 23 August 2002)[link] [http://www.mged.org/]
    5. Chris F Taylor et.al, A systematic approach to modelling capturing and disseminating proteomics experimental data, Nature Biotechnology, March 2003 Volume 21 Number 3 pp 247 - 254·   [http://pedro.man.ac.uk/home.shtml] [pdf] (Smriti Ramakrishnan, 03/02/04)
    6. TBD A paper relating Genbank/ASN.1/XML
     
  6. Biological Query Languages

    1. Sandeep Tata, Jignesh M. Patel."PiQA: An Algebra for Querying Protein Data Sets," SSDBM 2003.
    2. J. Hammel, M. Schneider. Genomics Algebra: A New, Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information. CIDR'02, Asilomar, California, USA, 2002, 176-187.
    3. Gösta Grahne, Raul Hakli, Matti Nykänen, Hellis Tamm and Esko Ukkonen: ''Design and Implementation of a String Database Query Language''. To appear as Information Systems 28(4), pages 311-337, 2003. (Special issue on bioinformatics and biological data management.)
    4. Alberto Lerner, Dennis Shasha. Aquery: Query Language for Ordered Data, Optimization Techniques, and Experiments.

  7. Comparative Genomics
    1. Stuart JM, Segal E, Koller D, Kim SK., A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003 Oct 10;302(5643):249-55. Epub 2003 Aug 21. PMID: 12934013 [html]
    2. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Nucleic Acids Res. 2001 Feb 1;29(3):774-82.
    3. Halfon MS, Grad Y, Church GM, Michelson AM. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model.Genome Res. 2002 Jul;12(7):1019-28. PMID: 12097338
  8. Other

    1. Tamer Kahveci, Christian Lang, Ambuj K. Singh,  Joining Massive High-Dimensional Databases, ICDE 2003, Banglore, India