From keywords to keyqueries: content descriptors for the web

Authors:
Tim Gollub;Matthias Hagen;Maximilian Michel;Benno Stein
Affiliations:
Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 11
Cited 0

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Lucene in Action (In Action series)

Lucene in Action (In Action series)
Top 10 algorithms in data mining

Knowledge and Information Systems
Topical query decomposition

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Retrievability: an evaluation measure for higher order information access tasks

Proceedings of the 17th ACM conference on Information and knowledge management
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Reverted indexing for feedback and expansion

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Capacity-constrained query formulation

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
The optimum clustering framework: implementing the cluster hypothesis

Information Retrieval
Towards optimum query segmentation: in doubt without

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are the minimal queries that return the document in the top result ranks. Besides applications in the fields of information retrieval and data mining, keyqueries have the potential to form the basis of a dynamic classification system for future digital libraries---the modern version of keywords for content description. To determine the keyqueries for a document, we present an exhaustive search algorithm along with effective pruning strategies. For applications where a small number of diverse keyqueries is sufficient, two tailored search strategies are proposed. Our experiments emphasize the role of the reference search engine and show the potential of keyqueries as innovative document descriptors for large, fast evolving bodies of digital content such as the web.