A statistical model for domain-independent text segmentation

Authors:
Masao Utiyama;Hitoshi Isahara
Affiliations:
Communications Research Laboratory, Soraku-gun, Kyoto, Japan;Communications Research Laboratory, Soraku-gun, Kyoto, Japan
Venue:
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Year:
2001

Citing 15
Cited 60

Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Topic segmentation: algorithms and applications

Topic segmentation: algorithms and applications
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word sense disambiguation and text segmentation based on lexical cohesion

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Statistical models for topic segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An algorithm for one-page summarization of a long text based on thematic hierarchy detection

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A bootstrapping approach for robust topic analysis

Natural Language Engineering
A Dynamic Programming Algorithm for Linear Text Segmentation

Journal of Intelligent Information Systems
Webified video: media conversion from TV program to web content and their integrated viewing method

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Linear text segmentation using a dynamic programming algorithm

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
NLP and IR approaches to monolingual and multilingual link detection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using collocations for topic segmentation and link detection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Spoken and written news story segmentation using lexical chains

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Discourse segmentation of multi-party conversation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Broad coverage paragraph segmentation across languages and domains

ACM Transactions on Speech and Language Processing (TSLP)
Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore (2001)

Computational Linguistics
Multi-task text segmentation and alignment based on weighted mutual information

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Minimum cut model for spoken lecture segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
ClassStruggle: a clustering based text segmentation

Proceedings of the 2007 ACM symposium on Applied computing
Topic segmentation with shared topic detection and alignment of multiple documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Question-driven segmentation of lecture speech text: Towards intelligent e-learning systems

Journal of the American Society for Information Science and Technology
TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS

Applied Artificial Intelligence
Topic and Viewpoint Extraction for Diversity and Bias Analysis of News Contents

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Word distributions for thematic segmentation in a support vector machine approach

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Text type structure and logical document structure

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical text segmentation from multi-scale lexical cohesion

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Global models of document structure using latent permutations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SegGen: a genetic algorithm for linear text segmentation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating case discussion segments in recorded medical team meetings

SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Efficient linear text segmentation based on information retrieval techniques

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Text segmentation via topic modeling: an analytical study

Proceedings of the 18th ACM conference on Information and knowledge management
An analysis of quantitative aspects in the evaluation of thematic segmentation algorithms

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Story segmentation and topic classification of broadcast news via a topic-based segmental model and a genetic algorithm

IEEE Transactions on Audio, Speech, and Language Processing
Content modeling using latent permutations

Journal of Artificial Intelligence Research
Recognising activities of daily life using hierarchical plans

EuroSSC'07 Proceedings of the 2nd European conference on Smart sensing and context
Word distribution based methods for minimizing segment overlaps

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
A dynamic programming model for text segmentation based on min-max similarity

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Quantifying the limits and success of extractive summarization systems across domains

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Linear text segmentation using classification techniques

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Unsupervised discourse segmentation of documents with inherently parallel structure

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Coverage-based methods for distributional stopword selection in text segmentation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Improving text segmentation with non-systematic semantic relation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Text segmentation: A topic modeling perspective

Information Processing and Management: an International Journal
An iterative approach to text segmentation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Summarizing textual information about locations

Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
Topic segmentation: application of mathematical morphology to textual data

ISMM'11 Proceedings of the 10th international conference on Mathematical morphology and its applications to image and signal processing
Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

Computer Speech and Language
Legal document clustering with built-in topic segmentation

Proceedings of the 20th ACM international conference on Information and knowledge management
Webified video: media conversion from TV programs to web content for cross-media information integration

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Building an automated SOAP classifier for emergency department reports

Journal of Biomedical Informatics
Using multiple discriminant analysis approach for linear text segmentation

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
TV news story segmentation based on semantic coherence and content similarity

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Semantic based adaptive movie summarisation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Text segmentation by product partition models and dynamic programming

Mathematical and Computer Modelling: An International Journal
The nonverbal structure of patient case discussions in multidisciplinary medical team meetings

ACM Transactions on Information Systems (TOIS)
How text segmentation algorithms gain from topic models

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TopicTiling: a text segmentation algorithm based on LDA

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Contextual web searches in Facebook using learning materials and discussion messages

Computers in Human Behavior
Text-Like motion representation for human motion retrieval

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Multimedia information seeking through search and hyperlinking

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Exploiting hybrid contexts for Tweet segmentation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Gem-based entity-knowledge maintenance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Unsupervised text segmentation using LDA and MCMC

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Complementarity of lexical cohesion and speaker role information for story segmentation of french TV broadcast news

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.