“Without the clutter of unimportant words”: Descriptive keyphrases for text visualization

Authors:
Jason Chuang;Christopher D. Manning;Jeffrey Heer
Affiliations:
Stanford University, CA;Stanford University, CA;Stanford University, CA
Venue:
ACM Transactions on Computer-Human Interaction (TOCHI)
Year:
2012

Citing 30
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Power browser: efficient Web browsing for PDAs

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Efficient web browsing on handheld devices using page and form summarization

ACM Transactions on Information Systems (TOIS)
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Probabilistic models of indexing and searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Using Noun Phrase Heads to Extract Document Keyphrases

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Fractal summarization for mobile devices to access large documents on the web

WWW '03 Proceedings of the 12th international conference on World Wide Web
ThemeRiver: Visualizing Theme Changes over Time

INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Applied morphological processing of English

Natural Language Engineering
Applications of term identification technology: domain description and content characterisation

Natural Language Engineering
Towards automatic extraction of monolingual and bilingual terminology

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Visualizing email content: portraying relationships from conversational histories

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Thesaurus based automatic keyphrase indexing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Extending the Linear Model with R (Texts in Statistical Science)

Extending the Linear Model with R (Texts in Statistical Science)
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Introduction to Information Retrieval

Introduction to Information Retrieval
Jigsaw: supporting investigative analysis through interactive visualization

Information Visualization
Comparing corpora using frequency profiling

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Search User Interfaces

Search User Interfaces
Participatory Visualization with Wordle

IEEE Transactions on Visualization and Computer Graphics
The automatic creation of literature abstracts

IBM Journal of Research and Development
Crowdsourcing graphical perception: using mechanical turk to assess visualization design

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
HUMB: Automatic key term extraction from scientific articles in GROBID

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
FacetAtlas: Multifaceted Visualization for Rich Text Corpora

IEEE Transactions on Visualization and Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are often used to create effective visualizations of text. While prior work in HCI and visualization has proposed a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611 responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics of human-generated keyphrases, including phrase length, commonness, position, and part of speech. Next, we systematically assess the contribution of each feature within statistical models of keyphrase quality. We then introduce a method for grouping similar terms and varying the specificity of displayed phrases so that applications can select phrases dynamically based on the available screen space and current context of interaction. Precision-recall measures find that our technique generates keyphrases that match those selected by human judges. Crowdsourced ratings of tag cloud visualizations rank our approach above other automatic techniques. Finally, we discuss the role of HCI methods in developing new algorithmic techniques suitable for user-facing applications.