Mining "Hidden phrase" definitions from the web

Authors:
Hung V. Nguyen;P. Velamuru;D. Kolippakkam;H. Davulcu;H. Liu;M. Ates
Affiliations:
Department of Computer Science and Engineering, Arizona State University, AZ;Department of Computer Science and Engineering, Arizona State University, AZ;Department of Computer Science and Engineering, Arizona State University, AZ;Department of Computer Science and Engineering, Arizona State University, AZ;Department of Computer Science and Engineering, Arizona State University, AZ;epartment of Computer Science and Engineering, NJ
Venue:
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Year:
2003

Citing 6
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Real life information retrieval (panel): commercial search engines

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections

ADL '98 Proceedings of the Advances in Digital Libraries Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A "hidden phrase" is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.