Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Supporting access to large digital oral history archives
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
An approach to indexing and clustering news stories using continuous language models
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Hi-index | 0.00 |
IBM's story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower Cseg by combining them. IBM's topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.