Unsupervised topic-oriented keyphrase extraction and its application to Croatian

Authors:
Josip Saratlija;Jan Šnajder;Bojana Dalbelo Bašić
Affiliations:
Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Venue:
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Year:
2011

Citing 12
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Using lexical chains for keyword extraction

Information Processing and Management: an International Journal
Automatic acquisition of inflectional lexica for morphological normalisation

Information Processing and Management: an International Journal
Unsupervised approaches for automatic keyword extraction using meeting transcripts

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Clustering to find exemplar terms for keyphrase extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised keyphrase extraction for search ontologies

NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1 = 44.5%, which is outperformed by human annotators, but comparable to a supervised approach.