Domain-specific keyphrase extraction

Authors:
Yi-fang Brook Wu;Quanzhi Li;Razvan Stefan Bot;Xin Chen
Affiliations:
New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 4
Cited 2

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Phrasier: a system for interactive document retrieval using keyphrases

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Identifying important concepts from medical documents

Journal of Biomedical Informatics
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified domain keyphrases to assign weights to the candidate keyphrases. The logic of our algorithm is: the more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. To obtain prior positive inputs, KIP first populates its glossary database using manually identified keyphrases and keywords. It then checks the composition of all noun phrases of a document, looks up the database and calculates scores for all these noun phrases. The ones having higher scores will be extracted as keyphrases.