On term selection for query expansion
Journal of Documentation
Lexical analysis and stoplists
Information retrieval
A system for retrieving speech documents
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Overlapping statistical word indexing: a new indexing method for Japanese text
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Improving two-stage ad-hoc retrieval for short queries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Subword-based approaches for spoken document retrieval
Subword-based approaches for spoken document retrieval
Unsupervised learning of the morphology of a natural language
Computational Linguistics
ACM Transactions on Asian Language Information Processing (TALIP)
Chinese word segmentation and its effect on information retrieval
Information Processing and Management: an International Journal
Questioning query expansion: an examination of behaviour and parameters
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
A multi-system analysis of document and term selection for blind feedback
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Entry vocabulary: a technology to enhance digital search
HLT '01 Proceedings of the first international conference on Human language technology research
Assessing the retrieval effectiveness of a speech retrieval system by simulating recognition errors
HLT '94 Proceedings of the workshop on Human Language Technology
Context-sensitive information retrieval using implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
WebKhoj: Indian language IR from multiple character encodings
Proceedings of the 15th international conference on World Wide Web
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Issues in searching for Indian language web content
Proceedings of the 2nd ACM workshop on Improving non english web searching
Textual representations for corpus-based bilingual retrieval
Textual representations for corpus-based bilingual retrieval
Addressing morphological variation in alphabetic languages
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Stemming and decompounding for German text retrieval
ECIR'03 Proceedings of the 25th European conference on IR research
University of hagen at CLEF 2004: indexing and translating concepts for the GIRT task
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Hi-index | 0.00 |
The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to create create a simple, language-independent corpus-based stemmer, 2) How to identify sub-words and which types of sub-words are suitable as indexing units, and 3) How to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conflation step and useful in the case of few language-specific resources. For English, the corpus-based stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages.