A combination of trie-trees and inverted files for the indexing of set-valued attributes

Authors:
Manolis Terrovitis;Spyros Passas;Panos Vassiliadis;Timos Sellis
Affiliations:
Nat. Technical Univ. Athens;Nat. Technical Univ. Athens;Univ. of Ioannina;Nat. Technical Univ. Athens
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 19
Cited 3

Signature files

Information retrieval
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
WSQ/DSQ: a practical approach for combined querying of databases and the Web

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient and tumble similar set retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Object Relational DBMSs: The Next Great Wave

Object Relational DBMSs: The Next Great Wave
Modern Information Retrieval

Modern Information Retrieval
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An Efficient Indexing Technique for Full Text Databases

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Adaptive algorithms for set containment joins

ACM Transactions on Database Systems (TODS)
Efficient processing of joins on set-valued attributes

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A performance study of four index structures for set-valued attributes of low cardinality

The VLDB Journal — The International Journal on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data

A hybrid index structure for set-valued attributes using itemset tree and inverted list

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Efficient answering of set containment queries for skewed item distributions

Proceedings of the 14th International Conference on Extending Database Technology
Efficient processing of containment queries on nested sets

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Set-valued attributes frequently occur in contexts like market-basked analysis and stock market trends. Late research literature has mainly focused on set containment joins and data mining without considering simple queries on set valued attributes. In this paper we address superset, subset and equality queries and we propose a novel indexing scheme for answering them on set-valued attributes. The proposed index superimposes a trie-tree on top of an inverted file that indexes a relation with set-valued data. We show that we can efficiently answer the aforementioned queries by indexing only a subset of the most frequent of the items that occur in the indexed relation. Finally, we show through extensive experiments that our approach outperforms the state of the art mechanisms and scales gracefully as database size grows.