Combining biological databases and text mining to support new bioinformatics applications

Authors:
René Witte;Christopher J. O. Baker
Affiliations:
Institute for Program Structures and Data Organization (IPD), Universität Karlsruhe (TH), Germany;Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada
Venue:
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Year:
2005

Citing 1
Cited 5

BioRAT: extracting biological information from full-length papers

Bioinformatics

Mutation Mining--A Prospector's Tale

Information Systems Frontiers
Semantic web infrastructure for fungal enzyme biotechnologists

Web Semantics: Science, Services and Agents on the World Wide Web
Connecting wikis and natural language processing systems

Proceedings of the 2007 international symposium on Wikis
Enhanced semantic access to the protein engineering literature using ontologies populated by text mining

International Journal of Bioinformatics Research and Applications
Improving information retrieval-based concept location using contextual relationships

Proceedings of the 34th International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up with the rapidly expanding volume of scientific literature, natural language processing approaches are becoming increasingly important for bioinformatic projects. In this paper, we go beyond simply extracting information from full-text articles by describing an architecture that supports targeted access to information from biological databases using the results derived from text mining of research papers, thereby integrating information from both sources within a biological application. The described architecture is currently being used to extract information about protein mutations from full-text research papers. Text mining results drive the retrieval of sequence information from protein databases and the employment of algorithmic sequence analysis tools, which facilitate further data access from protein structure databases. Complex mapping of NLP derived text annotations to protein structures allows the rendering, with 3D structure visualization, of information not available in databases of mutation annotations.