Optimization of bounded continuous search queries based on ranking distributions

Authors:
D. Kukulenz;N. Hoeller;S. Groppe;V. Linnemann
Affiliations:
Luebeck University, Institute of Information Systems, Luebeck, Germany;Luebeck University, Institute of Information Systems, Luebeck, Germany;Luebeck University, Institute of Information Systems, Luebeck, Germany;Luebeck University, Institute of Information Systems, Luebeck, Germany
Venue:
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Year:
2007

Citing 9
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
On multiple choice secretary problems

Mathematics of Operations Research
A study of retrospective and on-line event detection

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The score-distributional threshold optimization for adaptive binary classification tasks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Continual Queries for Internet Scale Event-Driven Information Delivery

IEEE Transactions on Knowledge and Data Engineering
Using graded relevance assessments in IR evaluation

Journal of the American Society for Information Science and Technology
Adaptive sampling for thresholding in document filtering and classification

Information Processing and Management: an International Journal
Answering bounded continuous search queries in the world wide web

Proceedings of the 16th international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common search problem in the World Wide Web concerns finding information if it is not known when the sources of information appear and how long sources will be available on the Web, as e.g. sales offers for products or news reports. Continuous queries are a means to monitor theWeb over a specific period of time. Main problems concerning the optimization of such queries are to provide high quality and up-to-date results and to control the amount of information returned by a continuous query engine. In this paper we present a new method to realize such search queries which is based on the extraction of the distribution of ranking values and a new strategy to select relevant data objects in a stream of documents. The new method provides results of significantly higher quality if ranking distributions may be modeled by Gaussian distributions. This is usually the case if a larger number of information sources on the Web and higher quality candidates are considered.