Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Wikipedia pages as entry points for book search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Clustering Documents with Active Learning Using Wikipedia
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Extracting key terms from noisy and multitheme documents
Proceedings of the 18th international conference on World wide web
Clustering Documents Using a Wikipedia-Based Concept Representation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Contextual Ranking of Keywords Using Click Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing cluster labeling using wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Wikipedia-Graph Based Key Concept Extraction towards News Analysis
CEC '09 Proceedings of the 2009 IEEE Conference on Commerce and Enterprise Computing
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Modern Information Retrieval
Characterizing use and quality of textual attributes in Web 2.0 applications
WebMedia '09 Proceedings of the XV Brazilian Symposium on Multimedia and the Web
Pattern based keyword extraction for contextual advertising
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In this paper we present three new methods to extract keywords from web pages using Wikipedia as an external source of information. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia definition. We compare our methods with three keyword extraction methods used as baselines: (i) all the terms of a web page, (ii) a TF-IDF implementation that extracts single weighted words of a web page and (iii) a previously proposed Wikipedia-based keyword extraction method presented in the literature. We compare our three keyword extraction methods with the baseline methods in three distinct scenarios, all related to our target application, which is the selection of ads in a context-based advertising system. In the first scenario, the target pages to place ads were extracted from Wikipedia articles, whereas the target pages in the other two scenarios were extracted from a news web site. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, effective and time efficient. For instance, in the first scenario our best method used to extract keywords from Wikipedia articles achieved an improvement of 33% when compared to the second best baseline, and a gain of 26% when considering all the terms.