Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Using lexical chains for keyword extraction
Information Processing and Management: an International Journal
Automatic acquisition of inflectional lexica for morphological normalisation
Information Processing and Management: an International Journal
Unsupervised approaches for automatic keyword extraction using meeting transcripts
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain-specific keyphrase extraction
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Clustering to find exemplar terms for keyphrase extraction
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised keyphrase extraction for search ontologies
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1 = 44.5%, which is outperformed by human annotators, but comparable to a supervised approach.