Topic detection using MFSs

Authors:
Ivan Yap;Han Tong Loh;Lixiang Shen;Ying Liu
Affiliations:
Department of Mechanical Engineering, Blk EA 07-08, National University of Singapore, Singapore;Department of Mechanical Engineering, Blk EA 07-08, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, Blk E4 01-07, National University of Singapore, Singapore;Singapore MIT Alliance, E4-04-10, National University of Singapore, Singapore
Venue:
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Year:
2006

Citing 6
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

Gather customer concerns from online product reviews - A text summarization approach

Expert Systems with Applications: An International Journal
On macro- and micro-level information in multiple documents and its influence on summarization

International Journal of Information Management: The Journal for Information Professionals

Quantified Score

Hi-index	0.00

Visualization

Abstract

When analyzing a document collection, a key piece of information is the number of distinct topics it contains. Document clustering has been used as a tool to facilitate the extraction of such information. However, existing clustering methods do not take into account the sequences of the words in the documents, and usually do not have the means to describe the contents within each topic cluster. In this paper, we record our investigation and results using Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. The supporting documents of MFSs are grouped into an equivalence class and then linked to a topic cluster, and the MFSs serve as the document cluster identifier. We describe the original method in extracting the set of MFSs, and how it can be adapted to identify topics in a textual dataset. We also demonstrate how the MFSs themselves can act as topic descriptors for the clusters. Finally, the benchmarking study with other existing clustering methods, i.e. k-Means and EM algorithm, shows the effectiveness of our approach for topic detection.