SEERLAB: A system for extracting key phrases from scholarly documents

Authors:
Pucktada Treeratpituk;Pradeep Teregowda;Jian Huang;C. Lee Giles
Affiliations:
Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA
Venue:
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Year:
2010

Citing 5
Cited 3

Random Forests

Machine Learning
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Re-examining automatic keyphrase extraction approaches in scientific articles

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Keyphrase extraction in scientific publications

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation

AckSeer: a repository and search engine for automatically extracted acknowledgments from digital libraries

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Towards building and analyzing a social network of acknowledgments in scientific and academic documents

SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Automatic keyphrase extraction from scientific articles

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the SEERLAB system that participated in the SemEval 2010's Keyphrase Extraction Task. SEERLAB utilizes the DBLP corpus for generating a set of candidate keyphrases from a document. Random Forest, a supervised ensemble classifier, is then used to select the top keyphrases from the candidate set. SEERLAB achieved a 0.24 F-score in generating the top 15 keyphrases, which places it sixth among 19 participating systems. Additionally, SEERLAB performed particularly well in generating the top 5 keyphrases with an F-score that ranked third.