Automatic selection of class labels from a thesaurus for an effective semantic tagging of corpora

Authors:
Alessandro Cucchiarelli;Paola Velardi
Affiliations:
Università di Ancona;Università di Roma 'La Sapienza'
Venue:
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Year:
1997

Citing 12
Cited 3

Lexical ambiguity and information retrieval

ACM Transactions on Information Systems (TOIS)
An empirical symbolic approach to natural language processing

Artificial Intelligence - Special volume on empirical methods
Integrating general-purpose and corpus-based verb classification

Computational Linguistics
Computational lexicons: the neat examples and the odd exemplars

ANLC '92 Proceedings of the third conference on Applied natural language processing
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word sense ambiguation: clustering related senses

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Generalizing automatically generated selectional patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A rule-based approach to prepositional phrase attachment disambiguation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Word sense disambiguation using Conceptual Density

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A rule-based and MT-oriented approach to prepositional phrase attachment

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Generalizing case frames using a thesaurus and the MDL principle

Computational Linguistics
Medical WordNet: a new methodology for the construction and validation of information resources for consumer health

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Ontological analysis of web surf history to maximize the click-through probability of web advertisements

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely accepted that tagging text with semantic information would improve the quality of lexical learning in corpus-based NLP methods. However available on-line taxonomies are rather entangled and introduce an unnecessary level of ambiguity. The noise produced by the redundant number of tags often overrides the advantage of semantic tagging. In this paper we propose an automatic method to select from WordNet a subset of domain-appropriate categories that effectively reduce the overambiguity of WordNet, and help at identifying and categorise relevant language patterns in a more compact way. The method is evaluated against a manually tagged corpus, SEMCOR.