Almost optimal set covers in finite VC-dimension: (preliminary version)
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
On approximating arbitrary metrices by tree metrics
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Integrating keyword search into XML query processing
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval
Database System Implementation
Database System Implementation
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Faceted metadata for image search and browsing
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Modeling Multidimensional Databases
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
What can Hierarchies do for Data Warehouses?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
DBXplorer: A System for Keyword-Based Search over Relational Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
How to build a WebFountain: An architecture for very large-scale text analytics
IBM Systems Journal
The integration of business intelligence and knowledge management
IBM Systems Journal
Improved approximation algorithms for geometric set cover
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient keyword search for smallest LCAs in XML databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient implementation of large-scale multi-structural databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
OLAP over uncertain and imprecise data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing cursor movement in holistic twig joins
Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Avatar semantic search: a database approach to information retrieval
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Query relaxation using malleable schemas
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
High performance index build algorithms for intranet search engines
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Information Processing and Management: an International Journal
Querying databases with taxonomies
ER'10 Proceedings of the 29th international conference on Conceptual modeling
Evolutionary taxonomy construction from dynamic tag space
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Efficient query rewrite for structured web queries
Proceedings of the 20th ACM international conference on Information and knowledge management
Rewriting null e-commerce queries to recommend products
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result quality. This new model may be applicable in many arenas, including multifaceted, product, and local search, where documents are augmented with hierarchical metadata such as topic or location. We present efficient algorithms for indexing and query processing in this new retrieval model. We decompose query processing into two sub-problems: first, an online search problem to determine the correct overall level of relaxation cost that must be incurred to generate the top k results; and second, a budgeted relaxation search problem in which all results at a particular relaxation cost must be produced at minimal cost. We show the latter problem is solvable exactly in two hierarchical dimensions, is NP-hard in three or more dimensions, but admits efficient approximation algorithms with provable guarantees. We present experimental results evaluating our algorithms on both synthetic and real data, showing order of magnitude improvements over the baseline algorithm.