A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Bionformatics Computing
Geometric Hashing: An Overview
IEEE Computational Science & Engineering
Locality preserving indexing for document representation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science and Technology - Bioinformatics
Dimensionality reduction for dimension-specific search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimal length of fragments for use in protein structure prediction
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
3D protein structure matching by patch signatures
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Data centric research at the University of Queensland
ACM SIGMOD Record
Hi-index | 0.00 |
Searching bio-chemical structures is becoming an important application domain of information retrieval. This paper introduces a protein structure matching problem and formulates it as an information retrieval problem. We first present a novel vector representation for protein structures, in which a protein structural region, formed by the vectors within the region, is defined as a patch and indexed by its patch signature. For a k-sized patch, its patch signature consists of 7k - 10 inter-atom distances which uniquely determine the patch's spatial structure. A patch matching function is then defined. As structures for proteins are large and complex, it is computationally expensive to identify possible matching patches for a given protein against a large protein database. We propose to apply dimensionality reduction to the patch signatures and show how the two problems are adapted to fit each other. The Locality Preservation Projection (LPP) and Singular Value Decomposition (SVD) are chosen and tested for this purpose. Experimental results show that the dimensionality reduction improves the searching speed while maintaining acceptable precision and recall. From a more general point of view, this paper demonstrates that information retrieval techniques can play a crucial role in solving this biologically critical but computationally expensive problem.