The Talent system: TEXTRACT architecture and data model
Natural Language Engineering
ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Participants in on-line discussion forums and decision makers are interested in understanding real-time communications between large numbers of parties on the internet and intranet. As a first step towards addressing this challenge, we developed a prototype to quickly identify and track topics in large, dynamic data sets based on assignment of documents to time slices, fast approximation of cluster centroids to identify discussion topics, and inter-slice correspondence mappings of topics. To verify our method, we conducted implementation studies with data from Innovation Jam 2006, an on-line brainstorming session, in which participants around the globe posted more than 37,000 opinions. Results from our prototype are consistent with the text in the postings, and would have required considerable effort to discover manually.