Parallel algorithms for mining frequent structural motifs in scientific data
Proceedings of the 18th annual international conference on Supercomputing
Finding Patterns on Protein Surfaces: Algorithms and Applications to Protein Classification
IEEE Transactions on Knowledge and Data Engineering
Discovering frequent topological structures from graph datasets
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Proceedings of the 2006 workshop on Memory system performance and correctness
Discovering frequent geometric subgraphs
Information Systems
Hi-index | 0.00 |
Biological macromolecules play a fundamental role indisease; therefore, they are of great interest to fields such aspharmacology and chemical genomics. Yet due to macromolecules'complexity, development of effective techniquesfor elucidating structure-function macromolecular relationshipshas been ill explored. Previous techniques have eitherfocused on sequence analysis, which only approximatesstructure-function relationships, or on small coor-dinatedatasets, which does not scale to large datasets orhandle noise. We present a novel scalable approach toefficiently discover macromolecule substructures based onthree-dimensional coordinate data, without domain-specificknowledge. The approach combines structure-based frequentpattern discovery with search space reduction andcoordinate noise handling. We analyze computational performancecompared to traditional approaches, validate thatour approach can discover meaningful substructures innoisy macromolecule data by automated discovery of primaryand secondary protein structures, and show that ourtechnique is superior to sequence-based approaches at determiningstructural, and thus functional, similarity be-tweenproteins.