Generic title labeling for clustered documents

Authors:
Yuen-Hsien Tseng
Affiliations:
National Taiwan Normal University, No. 162, Section 1, Heping East Road, Taipei 106, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 20
Cited 8

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
SONIA: a service for organizing networked information autonomously

Proceedings of the third ACM conference on Digital libraries
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Automatic title generation for EM

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Improving text categorization methods for event tracking

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
Mining massive document collections by the WEBSOM method

Information Sciences: an International Journal - Special issue: Soft computing data mining
Using the patent co-citation approach to establish a new patent classification system

Information Processing and Management: an International Journal
Headline generation based on statistical translation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Combining full text and bibliometric information in mapping scientific disciplines

Information Processing and Management: an International Journal - Special issue: Infometrics
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Text mining techniques for patent analysis

Information Processing and Management: an International Journal
Patent surrogate extraction and evaluation in the context of patent mapping

Journal of Information Science

W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Improving hierarchical document cluster labels through candidate term selection

Intelligent Decision Technologies
A clustering technique for news articles using WordNet

Knowledge-Based Systems
Probability based document clustering and image clustering using content-based image retrieval

Applied Soft Computing
Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR

Scientometrics
Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre

Expert Systems with Applications: An International Journal
Document clustering method using dimension reduction and support vector clustering to overcome sparseness

Expert Systems with Applications: An International Journal
Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation

Knowledge-Based Systems

Quantified Score

Hi-index	12.05

Visualization

Abstract

Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.