Phrases as subtopical concepts in scholarly text

Authors:
Asif-ul Haque;Paul Ginsparg
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Year:
2011

Citing 6
Cited 0

KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Using Noun Phrase Heads to Extract Document Keyphrases

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Detecting research topics via the correlation between graphs and texts

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Meme-tracking and the dynamics of the news cycle

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Retrieval of subtopical concepts from scholarly communication systems is now possible through a combination of text and metadata analysis, augmented by user search queries and click logs. Here we investigate how a "phrase", defined as a variable length sequence of vocabulary words, can be used to represent a concept. We present a method to extract such phrases from a text corpus, and rank them using a citation network measure, the compensated normalized link count (CNLC), which measures the extent to which they are propagated along the citation structure of articles. We validate the ranking with actively and passively determined metrics: comparison with human-assigned keywords, and comparison with passively harvested terms from search query logs. This method is demonstrated on full texts and abstracts from 7 years of high energy physics articles from the arXiv preprint database.