Blog analysis and mining technologies to summarize the wisdom of crowds

  • Authors:
  • Belle L. Tseng

  • Affiliations:
  • NEC Laboratories America, Cupertino, CA

  • Venue:
  • Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blogs have become a prominent social media that creates a fast growing social network on the Internet. Blogs enable users to quickly and easily publish content, including highly personal thoughts and professional opinions. Our objective is to understand the blogosphere and summarize the wisdom of crowds. To achieve this goal, my presentation will focus on three graph analysis and mining technologies, (1) clustering, (2) ranking, and (3) visualization. A blog is typically a web site that consists of dated entries in reverse chronological order written and maintained by a user (blogger). Since a blog entry can have hyperlinks to web pages or other blog entries, the information structure of blogs and links can be seen as a temporal graph. Temporal graphs open a new domain for social media analysis. The first technology is evolutionary graph clustering to discover blog communities. There are new challenges as traditional clustering techniques are applied to temporal data, such as blog data and streaming data, where the relation among data evolves with time. On one hand with long-term concept drifts, a naive approach based on aggregation will not give satisfactory cluster results. On the other hand, short-term variations are very often due to noise. Therefore clustering results should not change dramatically over short time and should exhibit temporal smoothness. We present two frameworks of incorporating temporal smoothness in evolutionary spectral clustering. The second technology is information flow ranking to identify influential bloggers. People constantly influence each other in all facets of life, including the wisdom of crowd in the blogosphere. Information flows in a social network where individuals influence each other. We present two graph ranking algorithms that leverage information flow to identify who are the influential nodes and where information should flow to. The third technology is temporal graph visualization to understand the bloggers dynamics. Our vision is to summarize the blogosphere as a social network of bloggers with wisdom. Discovering blog communities and ranking influential bloggers provide some insights. To observe the behaviors and dynamics, we present several visualization tools to facilitate researchers to observe patterns, including a demo of our blog summarization.