Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

Authors:
René Witte;Sabine Bergler
Affiliations:
Institut für Programmstrukturen und Datenorganisation (IPD), Universität Karlsruhe (TH), Germany;Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada
Venue:
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Year:
2007

Citing 6
Cited 4

Fuzzy sets, uncertainty, and information

Fuzzy sets, uncertainty, and information
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Survey of Text Mining

Survey of Text Mining
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
Change summarization in web collections

IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence

Connecting wikis and natural language processing systems

Proceedings of the 2007 international symposium on Wikis
Semantic Assistants --- User-Centric Natural Language Processing Services for Desktop Clients

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Trends Analysis of Topics Based on Temporal Segmentation

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Extracting hot spots of topics from time-stamped documents

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.