Category-based pseudowords

Authors:
Preslav I. Nakov;Marti A. Hearst
Affiliations:
EECS, UC Berkeley, Berkeley, CA;SIMS, UC Berkeley, Berkeley, CA
Venue:
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Year:
2003

Citing 1
Cited 11

The impact on retrieval effectiveness of skewed frequency distributions

ACM Transactions on Information Systems (TOIS)

An equivalent pseudoword solution to Chinese word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bootstrapping without the boot

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Differentiating homonymy and polysemy in information retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A Vicarious Words Method for Word Sense Discrimination

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Improving name discrimination: a language salad approach

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Multilingual name disambiguation with semantic information

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Domain information for fine-grained person name categorization

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Improving the use of pseudo-words for evaluating selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The effect of different context representations on word sense discrimination in biomedical texts

Proceedings of the 1st ACM International Health Informatics Symposium
An unsupervised language independent method of name discrimination using second order co-occurrence features

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A pseudoword is a composite comprised of two or more words chosen at random; the individual occurrences of the original words within a text are replaced by their conflation. Pseudowords are a useful mechanism for evaluating the impact of word sense ambiguity in many NLP applications. However, the standard method for constructing pseudowords has some drawbacks. Because the constituent words are chosen at random, the word contexts that surround pseudowords do not necessarily reflect the contexts that real ambiguous words occur in. This in turn leads to an optimistic upper bound on algorithm performance. To address these drawbacks, we propose the use of lexical categories to create more realistic pseudowords, and evaluate the results of different variations of this idea against the standard approach.