Affinity analysis of coded data sets

Authors:
Tapio Pitkaranta
Affiliations:
Helsinki University of Technology, Helsinki, Finland
Venue:
Proceedings of the 2009 EDBT/ICDT Workshops
Year:
2009

Citing 15
Cited 0

A data mining framework for optimal product selection in retail supermarket data: the generalized PROFSET model

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A relational model of data for large shared data banks

Communications of the ACM
PowerDB-IR: information retrieval on top of a database cluster

Proceedings of the tenth international conference on Information and knowledge management
The ten commandments of data warehousing

ACM SIGMIS Database
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems (TOIS)
Similarity between Euclidean and cosine angle distance for nearest neighbor queries

Proceedings of the 2004 ACM symposium on Applied computing
Using information retrieval techniques for supporting data mining

Data & Knowledge Engineering
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Neural Networks: A Comprehensive Foundation (3rd Edition)

Neural Networks: A Comprehensive Foundation (3rd Edition)
Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes

The VLDB Journal — The International Journal on Very Large Data Bases
Predicting individual disease risk based on medical history

Proceedings of the 17th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Coded data sets are commonly used as compact representations of real world processes. Such data sets have been studied within various research fields from association mining, data warehousing, knowledge discovery, collaborative filtering to machine learning. However, previous studies on coded data sets have introduced methods for the analysis of rather small data sets. This study proposes applying information retrieval for enabling high performance analysis of data masses that scale beyond traditional approaches. Part of this PHD study focuses on new type of kernel projection functions that can be used to find similarities in spare discrete data spaces. This study presents experimental results how information retrieval indexes scale and outperform two common relational data schemas with a leading commercial DBMS for market basket analysis.