Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identification of probable real words: an entropy-based approach
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Text-based summarization and visualization of gene clusters
Proceedings of the 2005 ACM symposium on Applied computing
Information Retrieval and Knowledge Discovery Utilizing a BioMedical Patent Semantic Web
IEEE Transactions on Knowledge and Data Engineering
IBM Journal of Research and Development
BioPatentMiner: an information retrieval system for biomedical patents
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Discovering semantic biomedical relations utilizing the Web
ACM Transactions on Knowledge Discovery from Data (TKDD)
Enhancing keyword-based botanical information retrieval with information extraction
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic metadata extraction from museum specimen labels
DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
MaxMatcher: biological concept extraction using approximate dictionary lookup
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Using concept-based indexing to improve language modeling approach to genomic IR
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Enhancing biomedical concept extraction using semantic relationship weights
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
Journals and conference proceedings represent the dominant mechanisms of reporting new biomedical results. The unstructured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine analyzable. In this paper we first present a system called BioAnnotator for identifying and annotating biological terms in documents. BioAnnotator uses domain based dictionary look-up for recognizing known terms and a rule engine for discovering new terms. The combination and dictionary look-up and rules result in good performance (87% precision and 94% recall on the GENIA 1.1 corpus for extracting general biological terms based on an approximate matching criterion). To demonstrate the subsequent mining and knowledge discovery activities that are made feasible by BioAnnotator, we also present a system called MedSummarizer that uses the extracted terms to identify the common concepts in a given group of genes.