A Monotonic On-Line Linear Algorithm for Hierarchical Agglomerative Classification

Authors:
Andreea B. Dragut;Codrin M. Nichitiu
Affiliations:
Department of Operations Planning and Control, Faculty of Technological Management, Technical University of Eindhoven, Pav. F10, Den Dolech 2, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands;EURISE, Faculté des Sciences et Techniques, Université Jean Monnet Saint Étienne 23, rue du Dr. Paul Michelon, F-42034 St Etienne Cedex 2, France codrin.nichitiu@univ-st-eti ...
Venue:
Information Technology and Management
Year:
2004

Citing 22
Cited 0

Algorithms for clustering data

Algorithms for clustering data
C4.5: programs for machine learning

C4.5: programs for machine learning
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Fast hierarchical clustering and other applications of dynamic closest pairs

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Methodological and practical aspects of data mining

Information and Management
Decision Trees and Diagrams

ACM Computing Surveys (CSUR)
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Measuring similarity of interests for clustering web-users

ADC '01 Proceedings of the 12th Australasian database conference
Building Data Mining Applications for CRM

Building Data Mining Applications for CRM
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Categorizing Visitors Dynamically by Fast and Robust Clustering of Access Logs

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient and Anonymous Web-Usage Mining for Web Personalization

INFORMS Journal on Computing
Adaptive web sites: an AI challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We start from an algorithm for on-line linear hierarchical classification for multidimensional data, using a centroid aggregation criterion. After evoking some real-life on-line settings where it can be used, we analyze it mathematically, in the framework of the Lance–Williams algorithms, proving that it does not have some useful properties: it is not monotonic, nor space-conserving. In order to use its on-line capabilities, we modify it and show that it becomes monotonic. While still not having the internal similarity-external dissimilarity property, the worst case classifications of the new algorithm are correctable with an additional small computational effort, on the overall taking O(n⋅k) time for n points and k classes. Experimental study confirm the theoretical improvements upon the initial algorithm. A theoretical and experimental comparison to other algorithms from the literature, shows that it is among the fastest and performs well.