The syntactic process
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
More accurate tests for the statistical significance of result differences
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Gene name ambiguity of eukaryotic nomenclatures
Bioinformatics
Domain-specific sense distributions and predominant sense acquisition
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The second release of the RASP system
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Wide-coverage efficient statistical parsing with ccg and log-linear models
Computational Linguistics
Feature forest models for probabilistic hpsg parsing
Computational Linguistics
Inter-species normalization of gene mentions with GNAT
Bioinformatics
A graph kernel for protein-protein interaction extraction
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Hi-index | 0.00 |
Named entity disambiguation concerns linking a potentially ambiguous mention of named entity in text to an unambiguous identifier in a standard database. One approach to this task is supervised classification. However, the availability of training data is often limited, and the available data sets tend to be imbalanced and, in some cases, heterogeneous. We propose a new method that distinguishes a named entity by finding the informative keywords in its surrounding context, and then trains a model to predict whether each keyword indicates the semantic class of the entity. While maintaining a comparable performance to supervised classification, this method avoids using expensive manually annotated data for each new domain, and thus achieves better portability.