Specialty mining

Authors:
Hanuma Kumar;Rohit Paravastu;Vikram Pudi
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Year:
2010

Citing 6
Cited 0

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.04

Visualization

Abstract

In this paper, we consider the problem of mining the special properties of a given record in a relational dataset. In our formulation, a property is a combination of multiple attribute-value pairs. The support of a property is the number of records that satisfy it. We consider a property as special if its support occurs to us as a shock and the measure of this shock factor is more than a user defined threshold η. We provide a way to define this notion of shock based on entropy. We also output the shock factor for records in the dataset in a convenient, easily-interpretable manner. An illustrated example is provided on how users can interpret the results. Experiments on real and synthetic data sets reveal interesting properties of data records that cannot be mined using traditional approaches.