A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
IEEE/ACM Transactions on Networking (TON)
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Topic Identification in Dynamical Text by Complexity Pursuit
Neural Processing Letters
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On-line new event detection, clustering, and tracking (information retrieval, internet)
On-line new event detection, clustering, and tracking (information retrieval, internet)
Hi-index | 0.00 |
This paper presents an evolutionary algorithm for modeling the arrival dates of document streams, which is any time-stamped collection of documents, such as newscasts, e-mails, scientific journals archives and weblog postings. The goal is to find a frequency curve that fits the data circumventing the unavoidable noise. Classical dynamic programming algorithms are limited by memory and efficiency requirements, which can be a problem when dealing with long streams. This suggests to explore alternative search methods which although do not guarantee optimality, are far more efficient. Experiments have shown that the designed evolutionary algorithm is able to reach high quality solutions in a short time. We have also explored different approaches to infer whether new arrivals increase or decrease interest in the topic the document stream is about. In particular, we present a variant of the evolutionary algorithm, which is able to very quickly fit a stream extended with new data, by taking advantage of the fit obtained for the original substream. These mechanisms can be used for real time detection of changes in the trend of interest in a topic, an important application of this kind of models.