Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
SONIA: a service for organizing networked information autonomously
Proceedings of the third ACM conference on Digital libraries
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Using Keyword Distributions
Journal of Intelligent Information Systems
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Automatic title generation for EM
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic generation of overview timelines
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Improving text categorization methods for event tracking
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic thesaurus generation for Chinese documents
Journal of the American Society for Information Science and Technology
Mining massive document collections by the WEBSOM method
Information Sciences: an International Journal - Special issue: Soft computing data mining
Using the patent co-citation approach to establish a new patent classification system
Information Processing and Management: an International Journal
Headline generation based on statistical translation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Combining full text and bibliometric information in mapping scientific disciplines
Information Processing and Management: an International Journal - Special issue: Infometrics
Automatically labeling hierarchical clusters
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Text mining techniques for patent analysis
Information Processing and Management: an International Journal
Patent surrogate extraction and evaluation in the context of patent mapping
Journal of Information Science
W-kmeans: clustering news articles using wordNet
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Improving hierarchical document cluster labels through candidate term selection
Intelligent Decision Technologies
A clustering technique for news articles using WordNet
Knowledge-Based Systems
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.