sDoc: exploring social wisdom for document enhancement in web mining

Authors:
Xiaoxun Zhang;Lichun Yang;Xian Wu;Honglei Guo;Zhili Guo;Shenghua Bao;Yong Yu;Zhong Su
Affiliations:
IBM China Research Lab, Beijing, China;Shanghai Jiaotong University, Shanghai, China;IBM China Research Lab, Beijing, China;IBM China Research Lab, Beijing, China;IBM China Research Lab, Beijing, China;IBM China Research Lab, Beijing, China;Shanghai Jiaotong University, Shanghai, China;IBM China Research Lab, Beijing, China
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 25
Cited 3

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic query expansion using query logs

Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining anchor text for query refinement

Proceedings of the 13th international conference on World Wide Web
Optimizing web search using web click-through data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Exploring social annotations for the semantic web

Proceedings of the 15th international conference on World Wide Web
Improved annotation of the blogosphere via autotagging and hierarchical clustering

Proceedings of the 15th international conference on World Wide Web
A comparison of implicit and explicit links for web page classification

Proceedings of the 15th international conference on World Wide Web
AutoTag: a collaborative approach to automated tag assignment for weblog posts

Proceedings of the 15th international conference on World Wide Web
Event detection from evolution of click-through data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
AnnoSearch: Image Auto-Annotation by Search

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A probabilistic relevance propagation model for hypertext retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Optimizing web search using social annotations

Proceedings of the 16th international conference on World Wide Web
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Ontologies are us: a unified model of social networks and semantics

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Folksonomy-based term extraction for word cloud generation

Proceedings of the 20th ACM international conference on Information and knowledge management
Folksonomy-Based Term Extraction for Word Cloud Generation

ACM Transactions on Intelligent Systems and Technology (TIST)
Sopra: a new social personalized ranking function for improving web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web document could be seen to be composed of textual content as well as social metadata of various forms (e.g., anchor text, search query and social annotation), both of which are valuable to indicate the semantic content of the document. However, due to the free nature of the web, the two streams of web data suffer from the serious problems of noise and sparseness, which have actually become the major challenges to the success of many web mining applications. Previous work has shown that it could enhance the content of web document by integrating anchor text and search query. In this paper, we study the problem of exploring emergent social annotation for document enhancement and propose a novel reinforcement framework to generate "social representation" of document. Distinguishing from prior work, textual content and social annotation are enhanced simultaneously in our framework, which is achieved by exploiting a kind of mutual reinforcement relationship behind them. Two convergent models, social content model and social annotation model, are symmetrically derived from the framework to represent enhanced textual content and enhanced social annotation respectively. The enhanced document is referred to as Social Document or sDoc in that it could embed complementary viewpoints from many web authors and many web visitors. In this sense, the document semantics is enhanced exactly by exploring social wisdom. We build the framework on a large Del.icio.us data and evaluate it through three typical web mining applications: annotation, classification and retrieval. Experimental results demonstrate that social representation of web document could boost the performance of these applications significantly.