CorePhrase: keyphrase extraction for document clustering

Authors:
Khaled M. Hammouda;Diego N. Matute;Mohamed S. Kamel
Affiliations:
Department of Systems Design Engineering;School of Computer Science;Department of Electrical and Computer Engineering, Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, ON, Canada
Venue:
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2005

Citing 6
Cited 7

Learning Algorithms for Keyphrase Extraction

Information Retrieval
Extraction of Text Phrases Using Hierarchical Grammar

AI '02 Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
TopCat: Data Mining for Topic Identification in a Text Corpus

IEEE Transactions on Knowledge and Data Engineering
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Multi-document summarization by graph search and matching

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Re-examining automatic keyphrase extraction approaches in scientific articles

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Category labelling for automatic classification scheme generation

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Constructing a dental implant ontology for domain specific clustering and life span analysis

Advanced Engineering Informatics
Automatic keyphrase extraction from scientific articles

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphrase extraction from multi-document data sets or text clusters provides a very compact summary of the contents of the clusters, which often helps in locating information easily. We introduce an algorithm for topic discovery using keyphrase extraction from multi-document sets and clusters based on frequent and significant shared phrases between documents. The keyphrases extracted by the algorithm are highly accurate and fit the cluster topic. The algorithm is independent of the domain of the documents. Subjective as well as quantitative evaluation show that the algorithm outperforms keyword-based cluster-labeling algorithms, and is capable of accurately discovering the topic, and often ranking it in the top one or two extracted keyphrases.