Coherent keyphrase extraction via web mining

Authors:
Peter D. Turney
Affiliations:
Institute for Information Technology, National Research Council of Canada, Ottawa, Ontario, Canada
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 13
Cited 51

A statistical learning approach to automatic indexing of controlled index terms

Journal of the American Society for Information Science
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Improving browsing in digital libraries with keyphrase indexes

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Human evaluation of Kea, an automatic keyphrasing system

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Information Retrieval

Information Retrieval
Automatic extraction of document keyphrases for use in digital libraries: evaluation and applications

Journal of the American Society for Information Science and Technology
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics

Web-assisted annotation, semantic indexing and search of television and radio news

WWW '05 Proceedings of the 14th international conference on World Wide Web
Developing practical automatic metadata assignment and evaluation tools for internet resources

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Narrative text classification for automatic key phrase extraction in web document corpora

Proceedings of the 7th annual ACM international workshop on Web information and data management
A practical system of keyphrase extraction for web pages

Proceedings of the 14th ACM international conference on Information and knowledge management
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Web Document Clustering by Using Automatic Keyphrase Extraction

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Web-based inference detection

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Using the wisdom of the crowds for keyword generation

Proceedings of the 17th international conference on World Wide Web
Site-Independent Template-Block Detection

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
KP-Miner: A keyphrase extraction system for English and Arabic documents

Information Systems
Estimating the impressionrank of web pages

Proceedings of the 18th international conference on World wide web
Competitive analysis from click-through log

Proceedings of the 18th international conference on World wide web
CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Unsupervised approaches for automatic keyword extraction using meeting transcripts

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic Keyphrase Extraction with a Refined Candidate Set

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Re-examining automatic keyphrase extraction approaches in scientific articles

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Human-competitive tagging using automatic keyphrase extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Using speech recognition and intelligent search tools to enhance information accessibility

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
Keyphrase extraction in scientific publications

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Automatic generation of personalized annotation tags for Twitter users

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
DFKI KeyWE: Ranking keyphrases extracted from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
SJTULTLAB: Chunk based method for keyphrase extraction

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
UNPMC: Naïve approach to extract keyphrases from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
UvT: The UvT term extraction system in the keyphrase extraction task

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Automatic free-text-tagging of online news archives

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Automatically suggesting topics for augmenting text documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluating N-gram based evaluation metrics for automatic keyphrase extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Keyphrases extraction from scientific documents: improving machine learning approaches with natural language processing

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Word clouds of multiple search results

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Introduction to linked data and its lifecycle on the web

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Keyword extraction based on sequential pattern mining

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
SCMS: semantifying content management systems

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)
Creating a testbed for the evaluation of automatically generated back-of-the-book indexes

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Automatic extraction and learning of keyphrases from scientific articles

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hierarchical topic term extraction for semantic annotation in chinese bulletin board system

ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Constructing personal knowledge base: automatic key-phrase extraction from multiple-domain web pages

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Learning to extract coherent keyphrases from online news

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Concept extraction for online shopping

Proceedings of the 14th Annual International Conference on Electronic Commerce
Keyphrase extraction through query performance prediction

Journal of Information Science
Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
DIKEA: domain-independent keyphrase extraction algorithm

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Journal of Information Science
Introduction to linked data and its lifecycle on the web

RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access
Automatic keyphrase extraction from scientific articles

Language Resources and Evaluation
Keyword extraction for blogs based on content richness

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).