Computational geometry: an introduction
Computational geometry: an introduction
A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Scalable Sweeping-Based Spatial Join
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
Querying XML Views of Relational Data
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A normal form for XML documents
ACM Transactions on Database Systems (TODS)
On the Integration of Structure Indexes and Inverted Lists
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Colorful XML: one hierarchy isn't enough
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An efficient and versatile query engine for TopX search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Flexible and efficient XML search with complex full-text predicates
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A time machine for text search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The BEA/XQRL streaming XQuery processor
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
PDFMeat: managing publications on the semantic desktop
Proceedings of the 20th ACM international conference on Information and knowledge management
Experiment explorer: lightweight provenance search over metadata
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Hi-index | 0.01 |
Google and other products have revolutionized the way we search for information. There are, however, still a number of research challenges. One challenge that arises specifically in desktop search is to exploit the structure and semantics of documents, as defined by the application program that generated the data (e.g., Word, Excel, or Outlook). The current generation of search products does not understand these structures and therefore often returns wrong results. This paper shows how today's search technology can be extended in order to take the specific semantics of certain structures into account. The key idea is to extend inverted file index structures with predicates which encode the circumstances under which certain keywords of a document become visible to a user. This paper provides a framework that allows to express the semantics of structures in documents and algorithms to construct enhanced, predicate-based indexes. Furthermore, this paper shows how keyword and phrase queries can be processed efficiently on such enhanced indexes. It is shown that the proposed approach has superior retrieval performance with regard to both recall and precision and has tolerable space and query running time overheads.