Description and performance analysis of signature file methods for office filing
ACM Transactions on Information Systems (TOIS)
Multikey access methods based on superimposed coding techniques
ACM Transactions on Database Systems (TODS)
Partitioned signature files: design issues and performance evaluation
ACM Transactions on Information Systems (TOIS)
Signature-based text retrieval methods: a survey
Data Engineering
A signature access method for the Starburst database system
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Evaluation of signature files as set access facilities in OODBs
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
S-tree: a dynamic balanced signature index for office retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Summary cache: a scalable wide-area Web cache sharing protocol
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Partial-match retrieval using indexed descriptor files
Communications of the ACM
IEEE Transactions on Knowledge and Data Engineering
Efficient Signature File Methods for Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Selection of Views to Materialize in a Data Warehouse
ICDT '97 Proceedings of the 6th International Conference on Database Theory
New Access Index for Fast Execution of Conjunctive Queries over Text Data
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Hi-index | 0.00 |
In a data warehouse, data cubes are accessed through their dimensions. If dimensions are numerical, because numerical data can be clustered or sorted, fast access methods such as binary search or B+ trees can be applied. However, complex attributes such as keyword sets of document contents are not easily sorted or clustered. Although it is highly desirable that documents can be searched through their sets of keywords.Signature index is known for its ability to search along complex attributes. We propose a new indexing structure, dimensional signature index (DSI), for fast query processing in data cubes. DSI is particularly suitable for accessing data in data cubes through complex dimensions.Through a mathematical analysis, we found that if one signature index (feature index) is built for each dimension of the data cube, if the size of all feature indices is equal to the size of a large signature index for the entire data cube as a flat file, and if a query execution involves all dimensions of a data cube, the search cost in all these feature indices is the same as the search cost in the large signature index for the entire data cube.The significance of this discovery is that usually a query does not involve all dimensions of a data cube. By making one feature index for each dimension, only those feature indices involved in the query predicates need to be accessed. On average, this represents significant faster query executions than using a large signature file for the entire data cube.The use of DSI scheme does not exclude the use of other fast signature index schemes. Each feature index in DSI can also use any of the previously proposed fast signature indices (S-trees, multi-leveled, frame-sliced, etc.) to achieve even faster access speed.