Interactive retrieval of complex documents
Information Processing and Management: an International Journal
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
SIAM Journal on Discrete Mathematics
XIRQL: An XML query language based on information retrieval concepts
ACM Transactions on Information Systems (TOIS)
FleXPath: flexible structure and full-text querying for XML
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Personal information management with SEMEX
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Structure and content scoring for XML
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Connections: using context to enhance file search
Proceedings of the twentieth ACM symposium on Operating systems principles
Report on the DB/IR panel at SIGMOD 2005
ACM SIGMOD Record
iDM: a unified and versatile data model for personal dataspace management
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Towards a semantic-aware file store
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
The continued saga of DB-IR integration
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Wayfinder: navigating and sharing information in a decentralized world
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Efficient Evaluation of Generalized Tree-Pattern Queries with Same-Path Constraints
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Supporting context-based query in personal DataSpace
Proceedings of the 18th ACM conference on Information and knowledge management
Unified structure and content search for personal information management systems
Proceedings of the 14th International Conference on Extending Database Technology
Hi-index | 0.00 |
With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a need for complex search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools usually index text content, allowing for some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multi-dimensional approach to semi-structured data searches in personal information management systems by allowing users to provide fuzzy structure and metadata conditions in addition to keyword conditions. Our techniques provide a complex query interface that is more comprehensive than content-only searches as it considers three query dimensions (content, structure, metadata) in the search. We propose techniques to individually score each dimension, as well as a framework to integrate the three dimension scores into a meaningful unified score. Our work is integrated in Wayfinder, an existing fully-functioning file system. We perform a thorough experimental evaluation of our techniques to show the effect of approximating individual dimensions on the overall scores and ranks of files, as well as on query performance. Our experiments show that our scoring strategy adequately takes into account the approximation in each dimension to efficiently evaluate fuzzy multi-dimensional queries. In addition, fuzzy query conditions in non-content dimensions can significantly improve scoring (and thus ranking) accuracy.