Using Incremental PLSI for Threshold-Resilient Online Event Analysis

  • Authors:
  • Tzu-Chuan Chou;Meng Chang Chen

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of on-line event analysis is to detect events and their associated documents in real-time from a continuous stream of documents generated by multiple information sources. Existing approaches (e.g., window-based, decay function, and adaptive threshold methods) incorporate the temporal relations of documents into traditional text categorization methods for event analysis. However, these methods suffer from the threshold dependence problem, i.e., their performance is only acceptable for a narrow range of thresholds; thus, it is difficult to designate an appropriate threshold in advance. In this paper, we propose a threshold resilient algorithm, called Incremental Probabilistic Latent Semantic Indexing (IPLSI), which can capture the storyline development of an event without the threshold dependence problem. The IPLSI algorithm is theoretically sound and more efficient than naïve PLSI approaches. The results of the performance evaluation based on the TDT 4 corpus show that the proposed algorithm reduces the error tradeoff cost of event detection by as much as 14.51% and increases the threshold range for acceptable performance by 300% - 800%