C4.5: programs for machine learning
C4.5: programs for machine learning
CYC: a large-scale investment in knowledge infrastructure
Communications of the ACM
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Categorizing unknown words: using decision trees to identify names and misspellings
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Aggressive morphology for robust lexical coverage
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Using an ontology to determine English countability
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning the countability of English nouns from corpus data
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Hi-index | 0.01 |
We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).