Word Particles Applied to Information Retrieval

Authors:
Evandro B. Gouvêa;Bhiksha Raj
Affiliations:
Mitsubishi Electric Research Labs, Cambridge, USA MA 02139;Mitsubishi Electric Research Labs, Cambridge, USA MA 02139
Venue:
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Year:
2009

Citing 3
Cited 0

Document centered approach to text normalization

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Speechbot: an experimental speech-based search engine formultimedia content on the web

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper we propose the use of a different, phonetically defined unit of representation that we call "particles". Particles are phonetic sequences that do not possess meaning. Both documents and queries are converted from their standard word-based form into sequences of particles. Indexing and retrieval is performed with particles. Experiments show that this scheme is capable of achieving retrieval performance that is comparable to that from words when the text in the documents and queries are clean, and can result in significantly improved retrieval when they are noisy.