A grammatico-statistical approach to discourse partitioning

Authors:
Tadashi Nomoto;Yoshihiko Nitta
Affiliations:
Advanced Research Laboratory, Hitachi Ltd.;Advanced Research Laboratory, Hitachi Ltd.
Venue:
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Year:
1994

Citing 6
Cited 10

Attention, intentions, and the structure of discourse

Computational Linguistics
Automatic text processing

Automatic text processing
TextTiling: A Quantitative Approach to Discourse

TextTiling: A Quantitative Approach to Discourse
Resolving zero anaphora in Japanese

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Intention-based segmentation: human reliability and correlation with linguistic cues

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Cut as a querying unit for WWW, Netnews, and E-mail

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Nine Issues in Speech Translation

Machine Translation
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Text Segmentation into Paragraphs Based on Local Text Cohesion

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
A bootstrapping approach for robust topic analysis

Natural Language Engineering
How to thematically segment texts by using lexical cohesion?

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Thematic segmentation of texts: two methods for two kinds of texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A new hybrid summarizer based on vector space model, statistical physics and linguistics

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Text segmentation based on document understanding for information retrieval

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents a new approach to text segmentation - which concerns dividing a text into coherent discourse units. The approach builds on the theory of discourse segment (Nomoto and Nitta, 1993), incorporating ideas from the research on information retrieval (Salton, 1988). A discourse segment has to do with a structure of Japanese discourse; it could be thought of as a linguistic unit demarcated by wa, a Japanese topic particle, which may extend over several sentences. The segmentation works with discourse segments and makes use of coherence measure based on tf-idf, a standard information retrieval measurement (Salton, 1988; Hearst, 1993). Experiments have been done with a Japanese newspaper corpus. It has been found that the present approach is quite successful in recovering articles from the unstructured corpus.