Principal Direction Divisive Partitioning

Authors:
Daniel Boley
Affiliations:
Department of Computer Science and Engineering, University of Minnesota, 200 Union Street S.E., Rm 4-192, Minneapolis, MN 55455, USA. boley@cs.umn.edu
Venue:
Data Mining and Knowledge Discovery
Year:
1998

Citing 13
Cited 70

Algorithms for clustering data

Algorithms for clustering data
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Method combination for document filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Projections for efficient document clustering

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet

Unsupervised updating of a classification tree in a dynamic environment

Proceedings of the third annual conference on Autonomous Agents
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning

Information Retrieval
Mining a web citation database for author co-citation analysis

Information Processing and Management: an International Journal
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Journal of Global Optimization
Collective Principal Component Analysis from Distributed, Heterogeneous Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Error Analysis of Automatic Speech Recognition Using Principal Direction Divisive Partitioning

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Clustering large unstructured document sets

Computational information retrieval
Algorithms for Bounded-Error Correlation of High Dimensional Data in Microarray Experiments

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Using DAML+OIL to classify intrusive behaviours

The Knowledge Engineering Review
Refining a divisive partitioning algorithm for unsupervised clustering

Design and application of hybrid intelligent systems
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
A divide-and-merge methodology for clustering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards personalised web intelligence

Knowledge and Information Systems
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments

Information Retrieval
A parallel hybrid web document clustering algorithm and its performance study

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

ACM Transactions on Mathematical Software (TOMS)
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
Enhancing the Effectiveness of Clustering with Spectra Analysis

IEEE Transactions on Knowledge and Data Engineering
Regularized clustering for documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the relationships between user profiles and navigation sessions in virtual communities: A data-mining approach

Intelligent Data Analysis
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

Intelligent Data Analysis
TaxaMiner: an experimentation framework for automated taxonomy bootstrapping

International Journal of Web and Grid Services
In search of deterministic methods for initializing K-means and Gaussian mixture clustering

Intelligent Data Analysis
Towards effective document clustering: A constrained K-means based approach

Information Processing and Management: an International Journal
Distributed collaborative Web document clustering using cluster keyphrase summaries

Information Fusion
Data Set Homeomorphism Transformation Based Meta-clustering

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
The Study of Dynamic Aggregation of Relational Attributes on Relational Data Mining

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Enhanced bisecting k-means clustering using intermediate cooperation

Pattern Recognition
An unsupervised clustering approach for leukaemia classification based on DNA micro-arrays data

Intelligent Data Analysis
Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions

Journal of Computational Physics
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Knowledge-assisted recognition of cluster boundaries in gene expression data

Artificial Intelligence in Medicine
Clustering: A neural network approach

Neural Networks
Tree view self-organisation of web content

Neurocomputing
Cooperative clustering

Pattern Recognition
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

The Journal of Machine Learning Research
A fast nonparametric noncausal MRF-based texture synthesis scheme using a novel FKDE algorithm

IEEE Transactions on Image Processing
A clustering scheme for large high-dimensional document datasets

ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Discretization numbers for multiple-instances problem in relational database

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
A fast divisive clustering algorithm using an improved discrete particle swarm optimizer

Pattern Recognition Letters
Enhancing principal direction divisive clustering

Pattern Recognition
Discriminative topic modeling based on manifold learning

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Multilevel manifold learning with application to spectral clustering

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Projection based clustering of gene expression data

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Computational Models of Learning the Raising-Control Distinction

Research on Language and Computation
Discriminative Topic Modeling Based on Manifold Learning

ACM Transactions on Knowledge Discovery from Data (TKDD)
Document mining based on semantic understanding of text

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Principal component analysis for distributed data sets with updating

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

Knowledge-Based Systems
Fast orthogonal nonnegative matrix tri-factorization for simultaneous clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Skin lesions characterisation utilising clustering algorithms

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Document clustering using linear partitioning hyperplanes and reallocation

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Streaming data reduction using low-memory factored representations

Information Sciences: an International Journal
Succinct initialization methods for clustering algorithms

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing
Fast estimation of nonparametric kernel density through PDDP, and its application in texture synthesis

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Generalizing the k-Windows clustering algorithm in metric spaces

Mathematical and Computer Modelling: An International Journal
Learning to change projects

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Random direction divisive clustering

Pattern Recognition Letters
Content based image retrieval system using NOHIS-tree

Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia
QUBiC: An adaptive approach to query-based recommendation

Journal of Intelligent Information Systems
A method for the acquisition of ontology-based user profiles

Advances in Engineering Software
Variational learning of finite Dirichlet mixture models using component splitting

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new algorithm capable of partitioning a set of documents orother samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method isunusual in that it is divisive, as opposed to agglomerative, and operates byrepeatedly splitting clusters into smaller clusters.The documents are assembled into a matrix which is very sparse. It is this sparsity that permits thealgorithm to be very efficient. The performance of the method isillustrated with a set of text documents obtained from the World Wide Web.Some possible extensions are proposed for further investigation.