Concept-based analysis of scientific literature

Authors:
Chen-Tse Tsai;Gourab Kundu;Dan Roth
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 12
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A bootstrapping approach to named entity classification using successive learners

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
The ACL Anthology Network corpus

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Discovering factions in the computational linguistics community

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the importance of identifying and categorizing scientific concepts as a way to achieve a deeper understanding of the research literature of a scientific community. To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts. We then propose a new clustering algorithm that uses citations' context as a way to cluster the extracted mentions into coherent concepts. Our evaluation of the algorithms against gold standards shows significant improvement over state-of-the-art results. More importantly, we analyze the computational linguistic literature using the proposed algorithms and show four different ways to summarize and understand the research community which are difficult to obtain using existing techniques.