Extracting hot spots of topics from time-stamped documents

Authors:
Wei Chen;Parvathi Chundi
Affiliations:
-;-
Venue:
Data & Knowledge Engineering
Year:
2011

Citing 23
Cited 3

Graph-Based Algorithms for Boolean Function Manipulation

IEEE Transactions on Computers
Modern Information Retrieval

Modern Information Retrieval
Databases and Transaction Processing: An Application-Oriented Approach

Databases and Transaction Processing: An Application-Oriented Approach
Introduction to Algorithms

Introduction to Algorithms
On the bursty evolution of blogspace

WWW '03 Proceedings of the 12th international conference on World Wide Web
Bursty and Hierarchical Structure in Streams

Data Mining and Knowledge Discovery
On lossy time decompositions of time stamped documents

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Parameter free bursty events detection in text streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Spatial scan statistics: approximations and performance study

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining correlated bursty topic patterns from coordinated text streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Binary Decision Diagrams

IEEE Transactions on Computers
Text Mining through Entity-Relationship Based Information Extraction

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
An unsupervised method for joint information extraction and feature mining across different Web sites

Data & Knowledge Engineering
Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP)

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering
Trends Analysis of Topics Based on Temporal Segmentation

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Fuzzy Classification of Web Reports with Linguistic Text Mining

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Vlogging: A survey of videoblogging technology on the web

ACM Computing Surveys (CSUR)
From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Data & Knowledge Engineering
Efficient algorithms for constructing time decompositions of time stamped documents

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Editorial: Profit-based scheduling and channel allocation for multi-item requests in real-time on-demand data broadcast systems

Data & Knowledge Engineering
Event identification in web social media through named entity recognition and topic modeling

Data & Knowledge Engineering
Editorial: COMPENDIUM: A text summarization system for generating abstracts of research papers

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying time periods with a burst of activities related to a topic has been an important problem in analyzing time-stamped documents. In this paper, we propose an approach to extract a hot spot of a given topic in a time-stamped document set. Topics can be basic, containing a simple list of keywords, or complex. Logical relationships such as and, or, and not are used to build complex topics from basic topics. A concept of presence measure of a topic based on fuzzy set theory is introduced to compute the amount of information related to the topic in the document set. Each interval in the time period of the document set is associated with a numeric value which we call the discrepancy score. A high discrepancy score indicates that the documents in the time interval are more focused on the topic than those outside of the time interval. A hot spot of a given topic is defined as a time interval with the highest discrepancy score. We first describe a naive implementation for extracting hot spots. We then construct an algorithm called EHE (Efficient Hot Spot Extraction) using several efficient strategies to improve performance. We also introduce the notion of a topic DAG to facilitate an efficient computation of presence measures of complex topics. The proposed approach is illustrated by several experiments on a subset of the TDT-Pilot Corpus and DBLP conference data set. The experiments show that the proposed EHE algorithm significantly outperforms the naive one, and the extracted hot spots of given topics are meaningful.