Affinity analysis of coded data sets

  • Authors:
  • Tapio Pitkaranta

  • Affiliations:
  • Helsinki University of Technology, Helsinki, Finland

  • Venue:
  • Proceedings of the 2009 EDBT/ICDT Workshops
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Coded data sets are commonly used as compact representations of real world processes. Such data sets have been studied within various research fields from association mining, data warehousing, knowledge discovery, collaborative filtering to machine learning. However, previous studies on coded data sets have introduced methods for the analysis of rather small data sets. This study proposes applying information retrieval for enabling high performance analysis of data masses that scale beyond traditional approaches. Part of this PHD study focuses on new type of kernel projection functions that can be used to find similarities in spare discrete data spaces. This study presents experimental results how information retrieval indexes scale and outperform two common relational data schemas with a leading commercial DBMS for market basket analysis.