Using patterns of thematic progression for building a table of contents of a text

Authors:
Marie-francine Moens
Affiliations:
Interdisciplinary centre for law and information technology, katholieke universiteit leuven, tienstraat 41, b-3000 leuven, belgium e-mail: marie-france.moens@law.kuleuven.be
Venue:
Natural Language Engineering
Year:
2008

Citing 21
Cited 1

Topic parsing: accounting for text macro structures in full-text analysis

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Hypermedia exploration with interactive dynamic maps

International Journal of Human-Computer Studies - Special issue: knowledge-based hypermedia
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
A maximum entropy approach to natural language processing

Computational Linguistics
Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Seeing the whole in parts: text summarization for web browsing on handheld devices

Proceedings of the 10th international conference on World Wide Web
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Fractal summarization for mobile devices to access large documents on the web

WWW '03 Proceedings of the 12th international conference on World Wide Web
Topic analysis using a finite mixture model

Information Processing and Management: an International Journal
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Automatic text summarization as applied to information retrieval: using indicative and informative summaries

Automatic text summarization as applied to information retrieval: using indicative and informative summaries
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Concept extraction from legal cases: the use of a statistic of coincidence

ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Generic technologies for single- and multi-document summarization

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval

Cross-media entity recognition in nearly parallel visual and textual documents

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

Quantified Score

Hi-index	0.01

Visualization

Abstract

A text usually contains one or a few main topics, which are split up into subtopics, which in their turn can be further described by more detailed topics. In this article we describe a system that segments a text into topics and subtopics. Each segment is characterized by important key terms that are extracted from it and by its begin and end position in the text. A table of contents is built by using the hierarchical and sequential relationships between topical segments that are identified in a text. The table of contents generator relies upon universal linguistic theories on the topic and comment of a sentence and on patterns of thematic progression in text. The linguistic theories of topic and comment are modeled both deterministically and probabilistically. The system is applied to English texts (news, World Wide Web and encyclopedia texts) and is evaluated.