Automatic association of web directories with word senses

Authors:
Celina Santamaría;Julio Gonzalo;Felisa Verdejo
Affiliations:
ETS Ingeniería Informática de la UNED, c/Juan del Rosal, 16 Ciudad Universitaria, 28040 Madrid, Spain;ETS Ingeniería Informática de la UNED, c/Juan del Rosal, 16 Ciudad Universitaria, 28040 Madrid, Spain;ETS Ingeniería Informática de la UNED, c/Juan del Rosal, 16 Ciudad Universitaria, 28040 Madrid, Spain
Venue:
Computational Linguistics - Special issue on web as corpus
Year:
2003

Citing 9
Cited 10

An automatic method for generating sense tagged corpora

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Retrieving descriptive phrases from large amounts of free text

Proceedings of the ninth international conference on Information and knowledge management
Multilingual Information Retrieval Based on Parallel Texts from the Web

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Utilizing the world wide web as an encyclopedia: extracting term descriptions from semi-structured texts

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Experiments in word domain disambiguation for parallel texts

WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
English lexical sample task description

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Machine learning with lexical features: the Duluth approach to Senseval-2

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Exploring automatic word sense disambiguation with decision lists and the web

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text characteristics of English language university Web sites: Research Articles

Journal of the American Society for Information Science and Technology
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Advertising keyword suggestion based on concept hierarchy

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
HIT-WSD: using search engine for multilingual Chinese-English lexical sample task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Automatic feeding of an innovation knowledge base using a semantic representation of field knowledge

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Unsupervised translation disambiguation based on maximum web bilingual relatedness: web as lexicon

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Wikipedia as sense inventory to improve diversity in Web search results

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Two web-based approaches for noun sense disambiguation

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
WebCAGe: a web-harvested corpus annotated with GermaNet senses

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense, Web directory) associations covering 88% of the domain-specific word senses in the test data with 86% accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised word sense disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable, and more structured than the full Web as a corpus.