Combining co-clustering with noise detection for theme-based summarization

Authors:
Xiaoyan Cai;Wenjie Li;Renxian Zhang
Affiliations:
Northwest Agricultural and Forestry University, Shaanxi, China;The Hong Kong Polytechnic University, Hung Hom, Hong Kong;The Hong Kong Polytechnic University and Samsung Electronics Research Center, China
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2014

Citing 38
Cited 0

Characterization and detection of noise in clustering

Pattern Recognition Letters
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Query-sensitive similarity measures for the calculation of interdocument relationships

Proceedings of the tenth international conference on Information and knowledge management
Introduction to the special issue on summarization

Computational Linguistics - Summarization
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Graph-based ranking algorithms for sentence extraction, applied to text summarization

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
A tutorial on spectral clustering

Statistics and Computing
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-document summarization using cluster-based link analysis

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised data pruning for clustering of noisy data

Knowledge-Based Systems
Integrating clustering and multi-document summarization to improve document understanding

Proceedings of the 17th ACM conference on Information and knowledge management
A matrix-based approach for semi-supervised document co-clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Spectral Clustering, Ordering and Ranking: Statistical Learning with Matrix Factorizations

Spectral Clustering, Ordering and Ranking: Statistical Learning with Matrix Factorizations
Scientific paper summarization using citation summary networks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
HCC: a hierarchical co-clustering algorithm

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A hybrid hierarchical model for multi-document summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Document update summarization using incremental hierarchical clustering

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A context-sensitive manifold ranking approach to query-focused multi-document summarization

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Efficient Semi-supervised Spectral Co-clustering with Constraints

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
A comparative study on ranking and selection strategies for multi-document summarization

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization

Journal of the American Society for Information Science and Technology
Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling

Information Processing and Management: an International Journal
Robust clustering methods: a unified view

IEEE Transactions on Fuzzy Systems
Extracting multi-document summaries with a double clustering approach

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two co-clustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in theme-based summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm.