Discoveries in bioinformatics provide new therapeutic interventions to disease by replacing expensive, time-consuming physical experiments with an automated computational search. Public databases now contain experimentally determined sequence and structural information for hundreds of thousands of proteins allowing for rather comprehensive digital investigation. Computational software and hardware limitations are the restricting factor in obtaining deeper understanding of molecular function from this available data. Computational development of future drugs and treatments involves many tools for processing, searching, and simulating these large databases. Improvements are clearly needed to develop a core library of algorithms underpinning the computational prediction and virtual screening for scientific discovery. As available computer systems are constantly changing in various forms and factors, these algorithms must be efficiently mapped to emerging architectures to continue to tackle problems of increased realism and significance.
Molecular sequence and structural information come from a variety of experimental apparatus, gene arrays, x-ray diffraction, nuclear magnetic resonance, electron and even light microscopy. Multiple experimental sources are important because methods which yield the highest-resolution views of molecular structures typically do not reflect function or behavior in natural environments. From multiple, often somewhat inconsistent, experimental datasets, geometric models of essential molecular assemblies are reconstructed. After producing a structural model of a critical biomolecular target (wild type and synthetic mutations), a search is performed for each target using a database of potential drug molecules to determine several candidates for whom binding is most effective and likely. The choice and models of the target, the choice and search through appropriate drug databases, and the selection of the top leads for druggability all require massive computational analysis and assessment of sequence structure and function interactions including effective methods for visualizing the proposed solutions and interactively tuning the optimization process. Computational leads are finally tested in laboratory experiments and further verified using microscopy.
CVC research on drug discovery spans this computational drug discovery pipeline. This begins with improved tools for reconstructing accurate structural models from x-ray diffraction and electron microscopy data. With these structural models, computationally efficient biochemical/physiological models for estimating the interaction between molecules and/or chemical compounds are being enhanced to more accurately reflect the true binding affinities. These models are integrated into fast search algorithms for identifying the minimum energy binding configurations of a target with a proposed drug. Underneath these specific challenges in computational biophysics and biochemistry are more fundamental issues in computational science including the fast Fourier transform, the fast multipole method, and variational methods for solving inverse problems. To harness modern computing, these algorithms are designed to be cache aware and optimally mapped to heterogeneous mutli-core CPU and many-core GPU architectures.