Leveraging microblogging big data with a modified density-based clustering approach for event awareness and topic ranking

  • Authors:
  • Chung-Hong Lee;Tzan-Feng Chien

  • Affiliations:
  • ;

  • Venue:
  • Journal of Information Science
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although diverse groups argue about the potential and true value benefits from social-media big data, there is no doubt that the era of big data exploitation has begun, driving the development of novel data-centric applications. Big data is notable not only because of its size, but also because of the complexity caused by its relationality to other data. In the past, owing to the limited possibilities of accessing big data, few data sources were available to allow researchers to develop advanced data-driven applications, such as monitoring of emerging real-world events. In fact, social media is greatly impacting the growth of big data; and big data is providing enterprises with the data to help them understand how to better detect marketing demands. Microblogging is a social network service capable of aggregating messages to explore facts and unknown knowledge. Nowadays, people often attempt to search for trending news and hot topics in real time from microblogging messages to satisfy their information needs. Under such a circumstance, a real demand is to find a way to allow users to organize a large number of microblogging messages into understandable events. In this work, we attempt to tackle such challenges by developing an online text-stream clustering approach using a modified density-based clustering model with collected microblogging big data. The system kernel combines three technical components, including a dynamic term weighting scheme, a neighbourhood generation algorithm and an online density-based clustering technique. After acquiring detected event topics by the system, our system provides functions for recommending top-priority event information to assist people to effectively organize emerging event data through the developed topic ranking algorithm.