Query by document

Authors:
Yin Yang;Nilesh Bansal;Wisam Dakka;Panagiotis Ipeirotis;Nick Koudas;Dimitris Papadias
Affiliations:
Computer Science, HKUST;University of Toronto;Search Quality, Google Inc.;New York University;University of Toronto;Computer Science, HKUST
Venue:
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Year:
2009

Citing 22
Cited 22

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance feedback revisited

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental relevance feedback

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Application of Spreading Activation Techniques in InformationRetrieval

Artificial Intelligence Review
Foundations of statistical natural language processing

Foundations of statistical natural language processing
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Using the Co-occurrence of Words for Retrieval Weighting

Information Retrieval
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Incorporating context information for the extraction of terms

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
A language model approach to keyphrase extraction

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Thesaurus based automatic keyphrase indexing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Benchmarking declarative approximate selection predicates

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Ranking very many typed entities on wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BlogScope: a system for online analysis of high volume text streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Computing lexical chains with graph clustering

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop

Monetizing User Activity on Social Networks - Challenges and Experiences

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Automatic retrieval of similar content using search engine query interface

Proceedings of the 18th ACM conference on Information and knowledge management
Online annotation of text streams with structured entities

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Challenges in personalized authority flow based ranking of social media

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Conditional ranking on relational data

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Interesting-phrase mining for ad-hoc text analytics

Proceedings of the VLDB Endowment
An algorithmic treatment of strong queries

Proceedings of the fourth ACM international conference on Web search and data mining
Learning to model relatedness for news recommendation

Proceedings of the 20th international conference on World wide web
Beyond the bag-of-words paradigm to enhance information retrieval applications

Proceedings of the Fourth International Conference on SImilarity Search and APplications
A source independent framework for research paper recommendation

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Query by document via a decomposition-based two-level retrieval approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Enriching textbooks with images

Proceedings of the 20th ACM international conference on Information and knowledge management
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Proceedings of the 15th International Conference on Extending Database Technology
Searching by corpus with fingerprints

Proceedings of the 15th International Conference on Extending Database Technology
Automatic refinement of patent queries using concept importance predictors

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generating queries from user-selected text

Proceedings of the 4th Information Interaction in Context Symposium
Topic-Sensitive hidden-web crawling

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A low rank structural large margin method for cross-modal ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Cross-media semantic representation via bi-directional learning to rank

Proceedings of the 21st ACM international conference on Multimedia
Entity-centric document filtering: boosting feature mapping through meta-features

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient regularized least-squares algorithms for conditional ranking on relational data

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa. In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular, we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank for this purpose. We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document.