Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Theory of Indexing
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Advances in metric embedding theory
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
Computer Speech and Language
Pattern Recognition Letters
Hi-index | 0.00 |
Mathematical Morphology (MM) offers a generic theoretical framework for data processing and analysis. Nevertheless, it remains essentially used in the context of image analysis and processing, and the attempts to use MM on other kinds of data are still quite rare.We believe MM can provide relevant solutions for data analysis and processing in a far broader range of application fields. To illustrate, we focus here on textual data and we show how morphological operators (here the morphological segmentation using watershed transform) may be applied on these data. We thus provide an original MM-based solution to the thematic segmentation problem, which is a typical problem in the fields of natural language processing and information retrieval (IR). More precisely, we consider here TV broadcasts through their transcription obtained by automatic speech recognition. To perform topic segmentation, we compute the similarity between successive segments using a technique called vectorization which has recently been introduced in the IR field. We then apply a gradient operator to build a topographic surface to be segmented using the watershed transform. This new topic segmentation technique is evaluated on two corpora of TV broadcasts on which it outperforms other existing approaches. Despite using very common morphological operators (i.e., the standard Watershed Transform), we thus show the potential interest of MM to be applied on nonimage data.