KPSpotter: a flexible information gain-based keyphrase extraction system

Authors:
Min Song;Il-Yeol Song;Xiaohua Hu
Affiliations:
Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA
Venue:
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Year:
2003

Citing 5
Cited 15

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
C4.5: programs for machine learning

C4.5: programs for machine learning
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Narrative text classification for automatic key phrase extraction in web document corpora

Proceedings of the 7th annual ACM international workshop on Web information and data management
Integration of association rules and ontologies for semantic query expansion

Data & Knowledge Engineering
Integration of association rules and ontologies for semantic query expansion

Data & Knowledge Engineering
GE-Miner: integration of cluster ensemble and text mining for comprehensive gene expression analysis

International Journal of Bioinformatics Research and Applications
Document Clustering by Semantic Smoothing and Dynamic Growing Cell Structure (DynGCS) for Biomedical Literature

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Microarray gene cluster identification and annotation through cluster ensemble and EM-based informative textual summarization

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Construction of a corporative information system for an electric power company

INES'10 Proceedings of the 14th international conference on Intelligent engineering systems
An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Integrating text chunking with mixture hidden markov models for effective biomedical information extraction

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
KXtractor: an effective biomedical information extraction technique based on mixture hidden markov models

Transactions on Computational Systems Biology II
Concept extraction for online shopping

Proceedings of the 14th Annual International Conference on Electronic Commerce
Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

International Journal of Intelligent Information Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

To tackle the issue of information overload, we present an Information Gain-based KeyPhrase Extraction System, called KPSpotter. KPSpotter is a flexible web-enabled keyphrase extraction system, capable of processing various formats of input data, including web data, and generating the extraction model as well as the list of keyphrases in XML. In KPSpotter, the following two features were selected for training and extracting keyphrases: 1) TF*IDF and 2) Distance from First Occurrence. Input training and testing collections were processed in three stages: 1) Data Cleaning, 2) Data Tokenizing, and 3) Data Discretizing. To measure the system performance, the keyphrases extracted by KPSpotter are compared with the ones that the authors assigned. Our experiments show that the performance of KPSpotter was evaluated to be equivalent to KEA, a well-known keyphrase extraction system. KPSpotter, however, is differentiated from other extraction systems in the followings: First, KPSpotter employs a new keyphrase extraction technique that combines the Information Gain data mining measure and several Natural Language Processing techniques such as stemming and case-folding. Second, KPSpotter is able to process various types of input data such as XML, HTML, and unstructured text data and generate XML output. Third, the user can provide input data and execute KPSpotter through the Internet. Fourth, for efficiency and performance reason, KPSpotter stores candidate keyphrases and its related information such as frequency and stemmed form into an embedded database management system.