Using Wikipedia concepts and frequency in language to extract key terms from support documents

Authors:
M. Romero;A. Moreo;J. L. Castro;J. M. Zurita
Affiliations:
Dep. of Computer Science and Artificial Intelligence, University of Granada, Spain;Dep. of Computer Science and Artificial Intelligence, University of Granada, Spain;Dep. of Computer Science and Artificial Intelligence, University of Granada, Spain;Dep. of Computer Science and Artificial Intelligence, University of Granada, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 21
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Text genre detection using common word frequencies

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
AUTOMATIC MACHINE LEARNING OF KEYPHRASE EXTRACTION FROM SHORT HTML DOCUMENTS WRITTEN IN HEBREW

Cybernetics and Systems
Extracting key terms from noisy and multitheme documents

Proceedings of the 18th international conference on World wide web
Using Wikipedia knowledge to improve text classification

Knowledge and Information Systems
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Graph-based keyword extraction for single-document summarization

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Computer assisted writing system

Expert Systems with Applications: An International Journal
Summary of FAQs from a topical forum based on the native composition structure

Expert Systems with Applications: An International Journal
Thesaurus Based Term Ranking for Keyword Extraction

DEXA '10 Proceedings of the 2010 Workshops on Database and Expert Systems Applications
Keyword Extraction Using Word Co-occurrence

DEXA '10 Proceedings of the 2010 Workshops on Database and Expert Systems Applications
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Evaluating Google queries based on language preferences

Journal of Information Science
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
Intelligent computer assisted blog writing system

Expert Systems with Applications: An International Journal

A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0

Knowledge-Based Systems
Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

In this paper, we present a new key term extraction system able to handle with the particularities of ''support documents''. Our system takes advantages of frequency-based and thesaurus-based approaches to recognize two different classes of key terms. On the one hand, it identifies multi-domain key terms of the collection using Wikipedia as knowledge resource. On the other hand, the system extracts specific key terms highly related with the context of a support document. We use the frequency in language as a criterion to detect and rank such terms. To prove the validity of our system we have designed a set of experiment using a Frequently Asked Questions (FAQ) collection of documents. Since our approach is generic, minor modifications should be undertaken to adapt the system to other kind of support documents. The empirical results evidence the validity of our approach.