Unsupervised corpus distillation for represented indicator measurement on focus species detection

Authors:
Chih-Hsuan Wei;Hung-Yu Kao
Affiliations:
Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan ROC;Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan ROC
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2013

Citing 7
Cited 0

Information Retrieval

Information Retrieval
Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

International Journal of Data Mining and Bioinformatics
Inter-species normalization of gene mentions with GNAT

Bioinformatics
High-performance gene name normalization with GeNo

Bioinformatics
Clinical text classification under the Open and Closed Topic Assumptions

International Journal of Data Mining and Bioinformatics
TX task: automatic detection of focus organisms in biomedical publications

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Disambiguating the species of biomedical named entities using natural language parsers

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gene ambiguity with the highest dimension is the species with which an entity is associated in biomedical text mining. Furthermore, one of the bottlenecks in gene normalisation is focus species detection. This study presents a method which is robust for all types of articles, particularly those without explicit species mentions. Since our method requires a training corpus, we developed an iterative distillation method to extend the corpus. Unsupervised corpus is therefore helpful for the detection of focus species. In experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% in datasets with and without species mentions respectively.