Clustering seasonality patterns in the presence of errors

Authors:
Mahesh Kumar;Nitin R. Patel;Jonathan Woo
Affiliations:
Operations Research Center, MIT, Cambridge, MA;Sloan School of Management, MIT, Cambridge, MA;ProfitLogic Inc., Cambridge, MA
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 6
Cited 8

Algorithms for clustering data

Algorithms for clustering data
Trajectory clustering with mixtures of regression models

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Decision Support System for Planning Manufacturers' Sales Promotion Calendars

Marketing Science
The Dynamic Effect of Discounting on Sales: Empirical Analysis and Normative Pricing Implications

Marketing Science

Clustering data with measurement errors

Computational Statistics & Data Analysis
Clustering of time series data-a survey

Pattern Recognition
Missing data imputation: a fuzzy K-means clustering algorithm over sliding window

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Tracklet descriptors for action modeling and video analysis

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Automatic generation of probabilistic relationships for improving schema matching

Information Systems
Patterns of temporal variation in online media

Proceedings of the fourth ACM international conference on Web search and data mining
A novel clustering method on time series data

Expert Systems with Applications: An International Journal
Action selection via learning behavior patterns in multi-robot domains

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a very well studied problem that attempts to group similar data points. Most traditional clustering algorithms assume that the data is provided without measurement error. Often, however, real world data sets have such errors and one can obtain estimates of these errors. We present a clustering method that incorporates information contained in these error estimates. We present a new distance function that is based on the distribution of errors in data. Using a Gaussian model for errors, the distance function follows a Chi-Square distribution and is easy to compute. This distance function is used in hierarchical clustering to discover meaningful clusters. The distance function is scale-invariant so that clustering results are independent of units of measuring data. In the special case when the error distribution is the same for each attribute of data points, the rank order of pair-wise distances is the same for our distance function and the Euclidean distance function. The clustering method is applied to the seasonality estimation problem and experimental results are presented for the retail industry data as well as for simulated data, where it outperforms classical clustering methods.