Reinforcing Web-object Categorization Through Interrelationships

Authors:
Gui-Rong Xue;Yong Yu;Dou Shen;Qiang Yang;Hua-Jun Zeng;Zheng Chen
Affiliations:
Computer Science and Engineering, Shanghai Jiao-Tong University, Shanghai, P.R. China 200030;Computer Science and Engineering, Shanghai Jiao-Tong University, Shanghai, P.R. China 200030;Hong Kong University of Science and Technology, Kowloon, Hong Kong;Hong Kong University of Science and Technology, Kowloon, Hong Kong;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R.China 100080;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R.China 100080
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 17
Cited 7

Support-Vector Networks

Machine Learning
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Enriching web taxonomies through subject categorization of query terms from search engine logs

Decision Support Systems - Web retrieval and mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Learning Probabilistic Models of Relational Structure

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Relevant term suggestion in interactive web search based on contextual information in query session logs

Journal of the American Society for Information Science and Technology
Query Expansion by Mining User Logs

IEEE Transactions on Knowledge and Data Engineering
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Knowledge Discovery in Multiple Databases

Knowledge Discovery in Multiple Databases

Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Using the wisdom of the crowds for keyword generation

Proceedings of the 17th international conference on World Wide Web
Hidden sentiment association in chinese web opinion mining

Proceedings of the 17th international conference on World Wide Web
Query-log mining for detecting spam

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Homophily of Neighborhood in Graph Relational Classifier

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Browse with a social web directory

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing categorization algorithms deal with homogeneous Web objects, and consider interrelated objects as additional features when taking the interrelationships with other types of objects into account. However, focusing on any single aspect of the inter-object relationship is not sufficient to fully reveal the true categories of Web objects. In this paper, we propose a novel categorization algorithm, called the Iterative Reinforcement Categorization Algorithm (IRC), to exploit the full interrelationship between different types of Web objects on the Web, including Web pages and queries. IRC classifies the interrelated Web objects by iteratively reinforcing the individual classification results of different types of objects via their interrelationship. Experiments on a clickthrough-log dataset from the MSN search engine show that, in terms of the F1 measure, IRC achieves a 26.4% improvement over a pure content-based classification method. It also achieves a 21% improvement over a query-metadata-based method, as well as a 16.4% improvement on F1 measure over the well-known virtual document-based method. Our experiments show that IRC converges fast enough to be applicable to real world applications.