Lexical analysis and stoplists
Information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
KEA: practical automatic keyphrase extraction
Proceedings of the fourth ACM conference on Digital libraries
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Probabilistic question answering on the web
Proceedings of the 11th international conference on World Wide Web
Using part-of-speech patterns to reduce query ambiguity
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Learning Algorithms for Keyphrase Extraction
Information Retrieval
KPSpotter: a flexible information gain-based keyphrase extraction system
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
World wide web site summarization
Web Intelligence and Agent Systems
Coherent keyphrase extraction via web mining
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Automatic document indexing in large medical collections
HIKM '06 Proceedings of the international workshop on Healthcare information and knowledge management
The AMTEx approach in the medical document indexing and retrieval application
Data & Knowledge Engineering
CollabRank: towards a collaborative approach to single-document keyphrase extraction
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A parametric methodology for text classification
Journal of Information Science
SUT: Quantifying and mitigating URL typosquatting
Computer Networks: The International Journal of Computer and Telecommunications Networking
Constructing personal knowledge base: automatic key-phrase extraction from multiple-domain web pages
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Concept extraction for online shopping
Proceedings of the 14th Annual International Conference on Electronic Commerce
Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text
International Journal of Intelligent Information Technologies
Discovering filter keywords for company name disambiguation in twitter
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Automatic key phrase extraction is a useful tool in many text related applications such as clustering and summarization. State-of-the-art methods are aimed towards extracting key phrases from traditional text such as technical papers. Application of these methods on Web documents, which often contain diverse and heterogeneous contents, is of particular interest and challenge in the information age. In this work, we investigate the significance of narrative text classification in the task of automatic key phrase extraction in Web document corpora. We benchmark three methods, TFIDF, KEA, and Keyterm, used to extract key phrases from all the plain text and from only the narrative text of Web pages. ANOVA tests are used to analyze the ranking data collected in a user study using quantitative measures of acceptable percentage and quality value. The evaluation shows that key phrases extracted from the narrative text only are significantly better than those obtained from all plain text of Web pages. This demonstrates that narrative text classification is indispensable for effective key phrase extraction in Web document corpora.