Clustering web images using association rules, interestingness measures, and hypergraph partitions

Authors:
Hassan H. Malik;John R. Kender
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY
Venue:
ICWE '06 Proceedings of the 6th international conference on Web engineering
Year:
2006

Citing 17
Cited 2

Another stemmer

ACM SIGIR Forum
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using Association Rules as Texture Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image Mining: Trends and Developments

Journal of Intelligent Information Systems
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
Analyzing the Subjective Interestingness of Association Rules

IEEE Intelligent Systems
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Association Rules Based on Image Content

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Region-Based Image Querying

CAIVL '97 Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)
Recognition of Images in Large Databases Using a Learning Framework

Recognition of Images in Large Databases Using a Learning Framework
Direct Interesting Rule Generation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Transaction-Based Neighbourhood-Driven Approach to Quantifying Interestingness of Association Rules

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Region-Based Image Retrieval with High-Level Semantic Color Names

MMM '05 Proceedings of the 11th International Multimedia Modelling Conference

Mining spatial gene expression data for association rules

BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Applicability of data mining algorithms for recommendation system in e-learning

Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new approach to cluster web images. Images are first processed to extract signal features such as color in HSV format and quantized orientation. Web pages referring to these images are processed to extract textual features (keywords) and feature reduction techniques such as stemming, stop word elimination, and Zipf's law are applied. All visual and textual features are used to generate association rules. Hypergraphs are generated from these rules, with features used as vertices and discovered associations as hyperedges. Twenty-two objective interestingness measures are evaluated on their ability to prune non-interesting rules and to assign weights to hyperedges. Then a hypergraph partitioning algorithm is used to generate clusters of features, and a simple scoring function is used to assign images to clusters. A tree-distance-based evaluation measure is used to evaluate the quality of image clustering with respect to manually generated ground truth. Our experiments indicate that combining textual and content-based features results in better clustering as compared to signal-only or text-only approaches. Online steps are done in real-time, which makes this approach practical for web images. Furthermore, we demonstrate that statistical interestingness measures such as Correlation Coefficient, Laplace, Kappa and J-Measure result in better clustering compared to traditional association rule interestingness measures such as Support and Confidence.