Weighting query terms based on distributional statistics

Authors:
Jussi Karlgren;Magnus Sahlgren;Rickard Cöster
Affiliations:
Swedish Institute of Computer Science, Kista, SE, Sweden;Swedish Institute of Computer Science, Kista, SE, Sweden;Swedish Institute of Computer Science, Kista, SE, Sweden
Venue:
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Year:
2005

Citing 4
Cited 0

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Distribution of content words and phrases in text and language modelling

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both – complementary – boosting schemes tested delivered improved results.