Information Preserving Time Decompositions of Time Stamped Documents*

Authors:
Parvathi Chundi;Daniel J. Rosenkrantz
Affiliations:
Computer Science Department, University of Nebraska at Omaha, Omaha, USA 68106;Computer Science Department, SUNY at Albany, Albany, USA 12222
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 10
Cited 3

Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Extracting significant time varying features from text

Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Discrete Mathematics and Its Applications

Discrete Mathematics and Its Applications
Introduction to Algorithms

Introduction to Algorithms
Mining temporal classes from time series data

Proceedings of the eleventh international conference on Information and knowledge management
Mining Surprising Patterns Using Temporal Description Length

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Querying Shapes of Histories

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On lossy time decompositions of time stamped documents

Proceedings of the thirteenth ACM international conference on Information and knowledge management

Efficient algorithms for segmentation of item-set time series

Data Mining and Knowledge Discovery
Extracting temporal equivalence relationships among keywords from time-stamped documents

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Efficient algorithms for constructing time decompositions of time stamped documents

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extraction of sequences of events from news and other documents based on the publication times of these documents has been shown to be extremely effective in tracking past events. This paper addresses the issue of constructing an optimal information preserving decomposition of the time period associated with a given document set, i.e., a decomposition with the smallest number of subintervals, subject to no loss of information. We introduce the notion of the compressed interval decomposition, where each subinterval consists of consecutive time points having identical information content. We define optimality, and show that any optimal information preserving decomposition of the time period is a refinement of the compressed interval decomposition. We define several special classes of measure functions (functions that measure the prevalence of keywords in the document set and assign them numeric values), based on their effect on the information computed as document sets are combined. We give algorithms, appropriate for different classes of measure functions, for computing an optimal information preserving decomposition of a given document set. We studied the effectiveness of these algorithms by computing several compressed interval and information preserving decompositions for a subset of the Reuters---21578 document set. The experiments support the obvious conclusion that the temporal information gleaned from a document set is strongly dependent on the measure function used and on other user-defined parameters.