An automatic method for generating sense tagged corpora
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Retrieving descriptive phrases from large amounts of free text
Proceedings of the ninth international conference on Information and knowledge management
Multilingual Information Retrieval Based on Parallel Texts from the Web
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Experiments in word domain disambiguation for parallel texts
WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
English lexical sample task description
SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Machine learning with lexical features: the Duluth approach to Senseval-2
SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Exploring automatic word sense disambiguation with decision lists and the web
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text characteristics of English language university Web sites: Research Articles
Journal of the American Society for Information Science and Technology
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
Advertising keyword suggestion based on concept hierarchy
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
HIT-WSD: using search engine for multilingual Chinese-English lexical sample task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Automatic feeding of an innovation knowledge base using a semantic representation of field knowledge
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Unsupervised translation disambiguation based on maximum web bilingual relatedness: web as lexicon
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Wikipedia as sense inventory to improve diversity in Web search results
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Two web-based approaches for noun sense disambiguation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
WebCAGe: a web-harvested corpus annotated with GermaNet senses
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense, Web directory) associations covering 88% of the domain-specific word senses in the test data with 86% accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised word sense disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable, and more structured than the full Web as a corpus.