Attention, intentions, and the structure of discourse
Computational Linguistics
Automatic text processing
TextTiling: A Quantitative Approach to Discourse
TextTiling: A Quantitative Approach to Discourse
Resolving zero anaphora in Japanese
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Intention-based segmentation: human reliability and correlation with linguistic cues
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Cut as a querying unit for WWW, Netnews, and E-mail
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Nine Issues in Speech Translation
Machine Translation
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Text Segmentation into Paragraphs Based on Local Text Cohesion
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
A bootstrapping approach for robust topic analysis
Natural Language Engineering
How to thematically segment texts by using lexical cohesion?
ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Thematic segmentation of texts: two methods for two kinds of texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A new hybrid summarizer based on vector space model, statistical physics and linguistics
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Text segmentation based on document understanding for information retrieval
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
The paper presents a new approach to text segmentation - which concerns dividing a text into coherent discourse units. The approach builds on the theory of discourse segment (Nomoto and Nitta, 1993), incorporating ideas from the research on information retrieval (Salton, 1988). A discourse segment has to do with a structure of Japanese discourse; it could be thought of as a linguistic unit demarcated by wa, a Japanese topic particle, which may extend over several sentences. The segmentation works with discourse segments and makes use of coherence measure based on tf-idf, a standard information retrieval measurement (Salton, 1988; Hearst, 1993). Experiments have been done with a Japanese newspaper corpus. It has been found that the present approach is quite successful in recovering articles from the unstructured corpus.