CUTS: CUrvature-based development pattern analysis and segmentation for blogs and other Text Streams

  • Authors:
  • Yan Qi;K. Selçuk Candan

  • Affiliations:
  • Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ

  • Venue:
  • Proceedings of the seventeenth conference on Hypertext and hypermedia
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Weblogs (blogs) are becoming prominent forms of information exchange in the Internet. A large number and variety of blogs, like personal journals or commentaries, are available for general consumption. However, effective indexes and navigation structures (like the table of content in a book) are not available for blogs. Therefore, it is generally not possible to navigate among entries in a given collection of blog entries in an informed manner. This paper focuses on the segmentation of entries in filter-type [9] blogs, with the aim of using this information for developing hypertext and navigational helps. In particular, we are interested in the analysis of topic development patterns that can provide information about not only the entries themselves, but how these entries develop and relate to each other. The proposed algorithm, CUTS, maps entries into a curve in a way that makes apparent a variety of topic development patterns. We then use curve analysis for automatic segmentation of topics. The resulting base topic segments are classified into different topic development patterns that can be visualized and indexed. Experimental results show that the proposed technique has very good performance in identifying boundaries in text streams, especially filter style blogs, versus existing schemes. Furthermore, compared with other topic segmentation methods, the proposed mechanism highlights not only topic boundaries, but also topic development patterns.