Relaxation in text search using taxonomies

Authors:
Marcus Fontoura;Vanja Josifovski;Ravi Kumar;Christopher Olston;Andrew Tomkins;Sergei Vassilvitskii
Affiliations:
Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 33
Cited 6

Almost optimal set covers in finite VC-dimension: (preliminary version)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
On approximating arbitrary metrices by tree metrics

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Integrating keyword search into XML query processing

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Database System Implementation

Database System Implementation
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Faceted metadata for image search and browsing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Modeling Multidimensional Databases

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
What can Hierarchies do for Data Warehouses?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
How to build a WebFountain: An architecture for very large-scale text analytics

IBM Systems Journal
The integration of business intelligence and knowledge management

IBM Systems Journal
Improved approximation algorithms for geometric set cover

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Multi-structural databases

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient implementation of large-scale multi-structural databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing cursor movement in holistic twig joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient query processing in geographic web search engines

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Avatar semantic search: a database approach to information retrieval

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Query relaxation using malleable schemas

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
High performance index build algorithms for intranet search engines

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Interactive query refinement

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Supporting multiple paths to objects in information hierarchies: Faceted classification, faceted search, and symbolic links

Information Processing and Management: an International Journal
Querying databases with taxonomies

ER'10 Proceedings of the 29th international conference on Conceptual modeling
Evolutionary taxonomy construction from dynamic tag space

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Efficient query rewrite for structured web queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Rewriting null e-commerce queries to recommend products

Proceedings of the 21st international conference companion on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result quality. This new model may be applicable in many arenas, including multifaceted, product, and local search, where documents are augmented with hierarchical metadata such as topic or location. We present efficient algorithms for indexing and query processing in this new retrieval model. We decompose query processing into two sub-problems: first, an online search problem to determine the correct overall level of relaxation cost that must be incurred to generate the top k results; and second, a budgeted relaxation search problem in which all results at a particular relaxation cost must be produced at minimal cost. We show the latter problem is solvable exactly in two hierarchical dimensions, is NP-hard in three or more dimensions, but admits efficient approximation algorithms with provable guarantees. We present experimental results evaluating our algorithms on both synthetic and real data, showing order of magnitude improvements over the baseline algorithm.