Ranking-constrained keyword sequence extraction from web documents

Authors:
Dingyi Chen;Xue Li;Jing Liu;Xia Chen
Affiliations:
The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia and Xidian University, Xi'an, China;The University of Queensland, Brisbane, Australia
Venue:
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Year:
2009

Citing 5
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Automatic hypertext keyphrase detection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a given search engine, then the corresponding Web document can be identified and is ranked at the first place within the results. We call this system as an Inverse Search Engine (ISE). Whenever a shortest keyword sequence is found for a given Web document, the corresponding document can be returned as the first document by the given search engine. The resulting keyword sequence is search-engine dependent. The ISE therefore can be used as a tool to manage Web content in terms of the extracted shortest keyword sequences. In this way, a traditional keyword extraction process is constrained by the document ranking method adopted by a search engine. The significance is that the whole Web-searchable documents on the World Wide Web can then be partitioned according to their keyword phrases. This paper discusses the design and implementation of the proposed ISE. Four evaluation measures are proposed and are used to show the effectiveness and efficiency of our approach. The experiment results set up a test benchmark for further researches.