A critique and improvement of an evaluation metric for text segmentation

Authors:
Lev Pevzner;Marti A. Hearst
Affiliations:
Harvard University, 380 Leverett Mail Center, Cambridge, MA;University of California, Berkeley 102 South Hall #4600, Berkeley, CA
Venue:
Computational Linguistics
Year:
2002

Citing 24
Cited 73

Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Towards content-based browsing of broadcast news video

Intelligent multimedia information retrieval
Broadcast news navigation using story segmentation

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Selecting text spans for document summaries: heuristics and metrics

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Modern Information Retrieval

Modern Information Retrieval
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Story Segmentation and Detection of Commercials in Broadcast News Video

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Discourse Segmentation in Aid of Document Summarization

HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 3 - Volume 3
TextTiling: A Quantitative Approach to Discourse

TextTiling: A Quantitative Approach to Discourse
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Intention-based segmentation: human reliability and correlation with linguistic cues

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Combining multiple knowledge sources for discourse segmentation

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A prosodic analysis of discourse segments in direction-giving monologues

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A grammatico-statistical approach to discourse partitioning

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Topic-based document segmentation with probabilistic latent semantic analysis

Proceedings of the eleventh international conference on Information and knowledge management
Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
SeLeCT: a lexical cohesion based news story segmentation system

AI Communications - STAIRS 2002
Automatic organization for digital photographs with geographic coordinates

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Thematic segmentation of meetings through document/speech alignment

Proceedings of the 12th annual ACM international conference on Multimedia
Using bi-modal alignment and clustering techniques for documents and speech thematic segmentations

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Spoken and written news story segmentation using lexical chains

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Discourse segmentation of multi-party conversation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Stylistic text segmentation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Broad coverage paragraph segmentation across languages and domains

ACM Transactions on Speech and Language Processing (TSLP)
Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore (2001)

Computational Linguistics
Combining audio and video to predict helpers' focus of attention in multiparty remote collaboration on physical tasks

Proceedings of the 8th international conference on Multimodal interfaces
Unsupervised topic modelling for multi-party spoken discourse

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discourse chunking and its application to sentence compression

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
ClassStruggle: a clustering based text segmentation

Proceedings of the 2007 ACM symposium on Applied computing
Topic segmentation with shared topic detection and alignment of multiple documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
User-oriented text segmentation evaluation measure

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Topic segmentation using weighted lexical links (WLL)

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Text Entailment for Logical Segmentation and Summarization

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS

Applied Artificial Intelligence
Inter-coder agreement for computational linguistics

Computational Linguistics
Unsupervised methods of topical text segmentation for Polish

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Word distributions for thematic segmentation in a support vector machine approach

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Using linguistically motivated features for paragraph boundary identification

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Feature-based segmentation of narrative documents

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Revealing the structure of medical dictations with conditional random fields

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Story segmentation of brodcast news in English, Mandarin and Arabic

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Text categorization with knowledge transfer from heterogeneous data sources

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Hierarchical text segmentation from multi-scale lexical cohesion

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Global models of document structure using latent permutations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SegGen: a genetic algorithm for linear text segmentation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating case discussion segments in recorded medical team meetings

SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Efficient linear text segmentation based on information retrieval techniques

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Aspect-based sentence segmentation for sentiment summarization

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
An analysis of quantitative aspects in the evaluation of thematic segmentation algorithms

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Dialogue segmentation with large numbers of volunteer internet annotators

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Content modeling using latent permutations

Journal of Artificial Intelligence Research
Textual energy of associative memories: performant applications of enertex algorithm in text summarization and topic segmentation

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Segmentation and annotation of audiovisual recordings based on automated speech recognition

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Text segmentation using context overlap

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
TextLec: a novel method of segmentation by topic using lower windows and lexical cohesion

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
A dynamic programming model for text segmentation based on min-max similarity

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A statistical model for topic segmentation and clustering

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Evaluating hierarchical discourse segmentation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised discourse segmentation of documents with inherently parallel structure

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Contextually-mediated semantic similarity graphs for topic segmentation

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Multi-document topic segmentation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards geographic databases enrichment

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Assessing the effectiveness of conversational features for dialogue segmentation in medical team meetings and in the AMI corpus

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Imposing hierarchical browsing structures onto spoken documents

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Text segmentation by clustering cohesion

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Improving text segmentation with non-systematic semantic relation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
An iterative approach to text segmentation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Recognizing authority in dialogue with an integer linear programming constrained model

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A normalized-cut alignment model for mapping hierarchical semantic structures onto spoken documents

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Automatic text segmentation for movie subtitles

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Linear text segmentation using affinity propagation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lexical chains using distributional measures of concept distance

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
It is the time for portuguese texts!

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
The nonverbal structure of patient case discussions in multidisciplinary medical team meetings

ACM Transactions on Information Systems (TOIS)
Segmentation similarity and agreement

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Topical segmentation: a study of human performance and a new measure of quality

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Getting more from segmentation evaluation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
How text segmentation algorithms gain from topic models

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TopicTiling: a text segmentation algorithm based on LDA

ACL '12 Proceedings of ACL 2012 Student Research Workshop
SITS: a hierarchical nonparametric model using speaker identity for topic segmentation in multiparty conversations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Two-part segmentation of text documents

Proceedings of the 21st ACM international conference on Information and knowledge management
An unsupervised topic segmentation model incorporating word order

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Triggering effective social support for online groups

ACM Transactions on Interactive Intelligent Systems (TiiS)
Topic segmentation and labeling in asynchronous conversations

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metric-called WindowDiff-moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.