Design and application of hybrid intelligent systems
Distributional term representations: an experimental comparison
Proceedings of the thirteenth ACM international conference on Information and knowledge management
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Aggregating inconsistent information: ranking and clustering
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Clustering quality based feature selection method
Machine Graphics & Vision International Journal
Measuring intrusion detection capability: an information-theoretic approach
ASIACCS '06 Proceedings of the 2006 ACM Symposium on Information, computer and communications security
Feature diversity in cluster ensembles for robust document clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient agent-based cluster ensembles
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Minimum sum-squared residue for fuzzy co-clustering
Intelligent Data Analysis
Chinese verb sense discrimination using an EM clustering model with rich linguistic features
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A web-based tutoring system with styles-matching strategy for spatial geometric transformation
Interacting with Computers
Automated extraction of behavioural profiles from document usage
BT Technology Journal
Aggregation of partial rankings, p-ratings and top-m lists
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
International Journal of Data Mining and Bioinformatics
Wireless sensor network aided search and rescue in trails
Proceedings of the 2nd international conference on Scalable information systems
Extracting and ranking viral communities using seeds and content similarity
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Correlated pattern mining in quantitative databases
ACM Transactions on Database Systems (TODS)
Representation and dimensionality reduction of semantically enriched clickstreams
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
Multisource images analysis using collaborative clustering
EURASIP Journal on Advances in Signal Processing
An information-theoretic approach to quantitative association rule mining
Knowledge and Information Systems
Ensemble clustering with voting active clusters
Pattern Recognition Letters
Resampling-based selective clustering ensembles
Pattern Recognition Letters
Address block segmentation using ensemble-clustering techniques
CompSysTech '08 Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Automated construction of web accessibility models from transaction click-streams
Proceedings of the 18th international conference on World wide web
A comparison of extrinsic clustering evaluation metrics based on formal constraints
Information Retrieval
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Estimating the number of clusters via system evolution for cluster analysis of gene expression data
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
QC4: a clustering evaluation method
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Active contours as knowledge discovery methods
DS'07 Proceedings of the 10th international conference on Discovery science
Autonomous news clustering and classification for an intelligent web portal
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Instance based clustering of semantic web resources
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Exploiting tree structure of a web page for clustering
International Journal of Knowledge and Web Intelligence
Medical case retrieval from a committee of decision trees
IEEE Transactions on Information Technology in Biomedicine
Robust clustering using discriminant analysis
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Extracting local web communities using lexical similarity
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
An efficient preprocessing stage for the relationship-based clustering framework
Intelligent Data Analysis
A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning
Research on Language and Computation
On measuring forgery quality in online signatures
Pattern Recognition
Tightly coupling visual and linguistic features for enriching audio-based web browsing experience
Proceedings of the 20th ACM international conference on Information and knowledge management
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A method for similarity-based grouping of biological data
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Transaction models for Web accessibility
World Wide Web
Ontology learning from text: A look back and into the future
ACM Computing Surveys (CSUR)
Thematic organization of web content for distraction-free text-to-speech narration
Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Multimedia Tools and Applications
Semi-metric Networks for Recommender Systems
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
LiveAction: Automating Web Task Model Generation
ACM Transactions on Interactive Intelligent Systems (TiiS)
Document clustering using dirichlet process mixture model of von Mises-Fisher distributions
Proceedings of the Fourth Symposium on Information and Communication Technology
Hi-index | 0.00 |
This dissertation takes a relationship-based approach to cluster analysis of high (1000 and more) dimensional data that side-steps the ‘curse of dimensionality’ issue by working in a suitable similarity space instead of the original feature space. We propose two frameworks that leverage graph algorithms to achieve relationship-based clustering and visualization, respectively. In the visualization framework, the output from the clustering algorithm is used to reorder the data points so that the resulting permuted similarity matrix can be readily visualized in 2 dimensions, with clusters showing up as bands. Results on retail transaction, document (bag-of-words), and web-log data show that our approach can yield superior results while also taking additional balance constraints into account. The choice of similarity is a critical step in relationship-based clustering and this motivates our systematic comparative study of the impact of similarity measures on the quality of document clusters . The key findings of our experimental study are: (i) Cosine, correlation, and extended Jaccard similarities perform comparably; (ii) Euclidean distances do not work well; (iii) graph partitioning tends to be superior to k-means and SOMs especially when balanced clusters are desired; and (iv) performance curves generally do not cross. We also propose a cluster quality evaluation measure based on normalized mutual information and find an analytical relation between similarity measures. It is widely recognized that combining multiple classification or regression models typically provides superior results compared to using a single, well-tuned model. However, there are no well known approaches to combining multiple clusterings. The idea of combining cluster labelings without accessing the original features leads to a general knowledge reuse framework that we call cluster ensembles. We propose a formal definition of the cluster ensemble as an optimization problem. Taking a relationship-based approach we propose three effective and efficient combining algorithms for solving it heuristically based on a hypergraph model. Results on synthetic as well as real data-sets show that cluster ensembles can (i) improve quality and robustness, and (ii) enable distributed clustering, and (iii) speed up processing significantly with little loss in quality.