C4.5: programs for machine learning
C4.5: programs for machine learning
Information Retrieval
Constructing Biological Knowledge Bases by Extracting Information from Text Sources
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A statistical profile of the Named Entity task
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
MUC5 '93 Proceedings of the 5th conference on Message understanding
Named entity recognition in biomedical texts using an HMM model
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Bio-medical entity extraction using support vector machines
Artificial Intelligence in Medicine
A novel approach to automatic gazetteer generation using Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Named entity recognition in Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Learning multilingual named entity recognition from Wikipedia
Artificial Intelligence
Hi-index | 0.00 |
We present two measures for comparing corpora based on information theory statistics such as gain ratio as well as simple term-class frequency counts. We tested the predictions made by these measures about corpus difficulty in two domains --- news and molecular biology --- using the result of two well-used paradigms for NE, decision trees and HMMs and found that gain ratio was the more reliable predictor.