Text compression
The design and analysis of efficient lossless data compression systems
The design and analysis of efficient lossless data compression systems
Communications of the ACM
Text Mining: A New Frontier for Lossless Compression
DCC '99 Proceedings of the Conference on Data Compression
An Open Interface for Probabilistic Models of Text
DCC '99 Proceedings of the Conference on Data Compression
Using Compression to Identify Acronyms in Text
DCC '00 Proceedings of the Conference on Data Compression
Combining PPM Models Using A Text Mining Approach
DCC '01 Proceedings of the Data Compression Conference
A pathway editor for literature-based knowledge curation
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
GAPSCORE: finding gene and protein names one word at a time
Bioinformatics
Protein names precisely peeled off free text
Bioinformatics
Protein name tagging for biomedical annotation in text
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Hi-index | 0.00 |
In this paper, we describe the utilization of text encoding and prediction by partial matching language modeling to identify gene functions within abstracts of biomedical papers. The National Center for Biotechnology Information has "GeneRIF" - a collection of the best possible functional representations for a subset of abstracts from PubMed. We use GeneRIF to test the efficiency of our technique. We discuss the methodology adopted to construct models necessary to enable the Text Mining Toolkit to distinguish between gene functions and the rest of the abstract (non gene functions). We also describe the similarity based approach we deploy on the list of automatically annotated functions to generate the most likely gene function representative of the paper. The results indicate that our combined approach to identify gene functions in scientific abstracts performs very well on both precision and recall, and therefore presents exciting opportunities for use in extracting other entities embedded in scientific text.