Unsupervised approaches for automatic keyword extraction using meeting transcripts

Authors:
Feifan Liu;Deana Pennell;Fei Liu;Yang Liu
Affiliations:
The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 10
Cited 11

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Lecture and Presentation Tracking in an Intelligent Meeting Room

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Discourse segmentation of multi-party conversation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Web resources for language modeling in conversational speech recognition

ACM Transactions on Speech and Language Processing (TSLP)
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Coherent keyphrase extraction via web mining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Automatic extraction and learning of keyphrases from scientific articles

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Long story short - Global unsupervised models for keyphrase based meeting summarization

Speech Communication
Automatic generation of personalized annotation tags for Twitter users

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
DFKI KeyWE: Ranking keyphrases extracted from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Towards automatic building of document keywords

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Hot topic detection in professional blogs

AMT'11 Proceedings of the 7th international conference on Active media technology
A Novel lecture browsing system using ranked key phrases and streamgraphs

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Unsupervised topic-oriented keyphrase extraction and its application to Croatian

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Unsupervised topic modeling approaches to decision summarization in spoken meetings

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Automatic keyphrase extraction from scientific articles

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores several unsupervised approaches to automatic keyword extraction using meeting transcripts. In the TFIDF (term frequency, inverse document frequency) weighting framework, we incorporated part-of-speech (POS) information, word clustering, and sentence salience score. We also evaluated a graph-based approach that measures the importance of a word based on its connection with other sentences or words. The system performance is evaluated in different ways, including comparison to human annotated keywords using F-measure and a weighted score relative to the oracle system performance, as well as a novel alternative human evaluation. Our results have shown that the simple unsupervised TFIDF approach performs reasonably well, and the additional information from POS and sentence score helps keyword extraction. However, the graph method is less effective for this domain. Experiments were also performed using speech recognition output and we observed degradation and different patterns compared to human transcripts.