Finding patterns in time series: a dynamic programming approach
Advances in knowledge discovery and data mining
Extracting significant time varying features from text
Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Discrete Mathematics and Its Applications
Discrete Mathematics and Its Applications
Introduction to Algorithms
Mining temporal classes from time series data
Proceedings of the eleventh international conference on Information and knowledge management
Mining Surprising Patterns Using Temporal Description Length
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On lossy time decompositions of time stamped documents
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Efficient algorithms for segmentation of item-set time series
Data Mining and Knowledge Discovery
Extracting temporal equivalence relationships among keywords from time-stamped documents
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Efficient algorithms for constructing time decompositions of time stamped documents
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Extraction of sequences of events from news and other documents based on the publication times of these documents has been shown to be extremely effective in tracking past events. This paper addresses the issue of constructing an optimal information preserving decomposition of the time period associated with a given document set, i.e., a decomposition with the smallest number of subintervals, subject to no loss of information. We introduce the notion of the compressed interval decomposition, where each subinterval consists of consecutive time points having identical information content. We define optimality, and show that any optimal information preserving decomposition of the time period is a refinement of the compressed interval decomposition. We define several special classes of measure functions (functions that measure the prevalence of keywords in the document set and assign them numeric values), based on their effect on the information computed as document sets are combined. We give algorithms, appropriate for different classes of measure functions, for computing an optimal information preserving decomposition of a given document set. We studied the effectiveness of these algorithms by computing several compressed interval and information preserving decompositions for a subset of the Reuters---21578 document set. The experiments support the obvious conclusion that the temporal information gleaned from a document set is strongly dependent on the measure function used and on other user-defined parameters.