Context representation using word sequences extracted from a news corpus

Authors:
Hiroshi Sekiya;Takeshi Kondo;Makoto Hashimoto;Tomohiro Takagi
Affiliations:
Department of Computer Science, Meiji University, 1-1-1, Higashi Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan;Department of Computer Science, Meiji University, 1-1-1, Higashi Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan;Department of Computer Science, Meiji University, 1-1-1, Higashi Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan;Department of Computer Science, Meiji University, 1-1-1, Higashi Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan
Venue:
International Journal of Approximate Reasoning
Year:
2007

Citing 11
Cited 2

Self-organized language modeling for speech recognition

Readings in speech recognition
Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Word sense disambiguation in information retrieval revisited

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Homonymy and polysemy in information retrieval

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology
Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

IEEE Transactions on Information Theory

An automatic extraction method of word tendency judgement for specific subjects

International Journal of Computer Applications in Technology
Recurrent Confabulation Model for Annotated Image Retrieval

International Journal of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ambiguity in language is one of the most difficult problems in dealing with word senses using computers. Word senses vary dynamically depending on context. We need to specify the context to identify these. However, context also varies depending on specificity and the viewpoint of the topic. Therefore, generally speaking, people pay attention to the part of the attributes of the entity, which the dictionary definition of the word indicates, depending on such variant contexts. Dealing with word senses on computer can be split into two steps. The first is to determine all the different senses for every word, and the second is to assign each occurrence of a word to the appropriate sense. In this paper, we propose a method focusing on the first step, which is to generate atomic conceptual fuzzy sets using word sequences. Then, both contexts identified by word sequences and atomic conceptual fuzzy sets, which express word senses, and that are related to the contexts can be shown concretely. We used the Reuters collection consisting of 800,000 news articles, and extracted word sequences and generated fuzzy sets automatically using the confabulation model (a prediction method similar to the n-gram model) and five statistical measures as relations. We compared the compatibility between the confabulation model and each measure, and found that cogency and mutual information were the most effective in representing context. We demonstrate the usefulness of the word sequences to identify the context.