Organization and Tagging of Blog and News Entries Based on Content Reuse

  • Authors:
  • Jong Wook Kim;K. Selçuk Candan;Junichi Tatemura

  • Affiliations:
  • Comp. Sci. and Eng. Dept., Arizona State University, Tempe, USA 85287;Comp. Sci. and Eng. Dept., Arizona State University, Tempe, USA 85287;NEC Labs, America, Cupertino, USA 95014

  • Venue:
  • Journal of Signal Processing Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As their popularity as dynamic platforms for information dissemination and sharing increases, the use of Weblogs (blogs) which track and comment on real world (political, news, entertainment) events is also growing. The success of the blog as a popular medium for information sharing, on the other hand, is also its weakest spot in that there is little support beyond keyword based searches for blog entries. Consequently, there is impending need for navigational support, which can help users relate a large, diverse, and inherently distributed collection of blogosphere. In this paper, we first note that the existence of large degrees of content overlaps in the form of quotation/commentary pairs (as well as content borrowings across media outlets) can be leveraged for tracking the topic development patterns within the blogosphere. Relying on this observation, we first propose focus and flow analysis techniques that rely on reuse detection and focus and flow to help place blog entries into logical organizations. We then show that these implicit or explicit quotations as well as focus analysis could be leveraged to identify the contexts in which entries occur; thus, resulting in more effective tagging. Thus, we propose CDIP (a collection-driven, yet individuality-preserving tagging system) which relies on relationships provided by quotation/reuse detection and semantic-focus analysis to automatically tag the blogs in such a way that, not-only the related blogs share tags, but also individuality of the entries is preserved for discriminating tag-based accesses.