Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
On the need for time series data mining benchmarks: a survey and empirical demonstration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bursty and Hierarchical Structure in Streams
Data Mining and Knowledge Discovery
OSSM: A Segmentation Approach to Optimize Frequency Counting
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An algorithm for one-page summarization of a long text based on thematic hierarchy detection
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Spatial scan statistics: approximations and performance study
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining correlated bursty topic patterns from coordinated text streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Fuzzy Clustering for Topic Analysis and Summarization of Document Collections
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Extracting hot spots of topics from time-stamped documents
Data & Knowledge Engineering
Hi-index | 0.00 |
Extracting interesting information from large unstructured document sets is a time consuming task. In this paper, we describe an approach to analyze the temporal trends of a given topic in a time-stamped document set based on time series segmentation. We consider topics containing multiple keywords and use a fuzzy set based method to compute a numeric value to measure the relevance of a document set to the given topic. The measure of relevance is then used to assign a discrepancy score to a segmentation of the time period associated with the document set. The discrepancy score of a segmentation represents the likelihood of the topic across all segments in a segmentation. Given a user specified value k , we then define a min different k segmentation to capture the k -segmentation with the maximum possible discrepancy score and describe a dynamic-programming based algorithm to compute it. The proposed approach is illustrated by several experiments using a subset of the TDT-Pilot Corpus data set. Our experiments show that the min difference k segmentation successfully highlights the temporal trends of a topic using k segments.