Topic parsing: accounting for text macro structures in full-text analysis
Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Hypermedia exploration with interactive dynamic maps
International Journal of Human-Computer Studies - Special issue: knowledge-based hypermedia
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
A maximum entropy approach to natural language processing
Computational Linguistics
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
The Theory and Practice of Discourse Parsing and Summarization
The Theory and Practice of Discourse Parsing and Summarization
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Fractal summarization for mobile devices to access large documents on the web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Topic analysis using a finite mixture model
Information Processing and Management: an International Journal
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
Automatic text summarization as applied to information retrieval: using indicative and informative summaries
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Concept extraction from legal cases: the use of a statistic of coincidence
ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Generic technologies for single- and multi-document summarization
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Cross-media entity recognition in nearly parallel visual and textual documents
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.01 |
A text usually contains one or a few main topics, which are split up into subtopics, which in their turn can be further described by more detailed topics. In this article we describe a system that segments a text into topics and subtopics. Each segment is characterized by important key terms that are extracted from it and by its begin and end position in the text. A table of contents is built by using the hierarchical and sequential relationships between topical segments that are identified in a text. The table of contents generator relies upon universal linguistic theories on the topic and comment of a sentence and on patterns of thematic progression in text. The linguistic theories of topic and comment are modeled both deterministically and probabilistically. The system is applied to English texts (news, World Wide Web and encyclopedia texts) and is evaluated.