Multimodal News Story Clustering With Pairwise Visual Near-Duplicate Constraint

Authors:
Xiao Wu;Chong-Wah Ngo;A. G. Hauptmann
Affiliations:
Carnegie Mellon Univ., Pittsburgh;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2008

Citing 0
Cited 5

Large-scale news topic tracking and key-scene ranking with video near-duplicate constraints

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
News Topic Tracking and Re-ranking with Query Expansion Based on Near-Duplicate Detection

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Consumer photo management and browsing facilitated by near-duplicate detection with feature filtering

Journal of Visual Communication and Image Representation
News story clustering from both what and how aspects: using bag of word model and affinity propagation

AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
PageRank with text similarity and video near-duplicate constraints for news story re-ranking

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

Story clustering is a critical step for news retrieval, topic mining, and summarization. Nonetheless, the task remains highly challenging owing to the fact that news topics exhibit clusters of varying densities, shapes, and sizes. Traditional algorithms are found to be ineffective in mining these types of clusters. This paper offers a new perspective by exploring the pairwise visual cues deriving from near-duplicate keyframes (NDK) for constraint-based clustering. We propose a constraint-driven co-clustering algorithm (CCC), which utilizes the near-duplicate constraints built on top of text, to mine topic-related stories and the outliers. With CCC, the duality between stories and their underlying multimodal features is exploited to transform features in low-dimensional space with normalized cut. The visual constraints are added directly to this new space, while the traditional DBSCAN is revisited to capitalize on the availability of constraints and the reduced dimensional space. We modify DBSCAN with two new characteristics for story clustering: 1) constraint-based centroid selection and 2) adaptive radius. Experiments on TRECVID-2004 corpus demonstrate that CCC with visual constraints is more capable of mining news topics of varying densities, shapes and sizes, compared with traditional k-means, DBSCAN, and spectral co-clustering algorithms.