Scale-Space and Edge Detection Using Anisotropic Diffusion
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approaches to passage retrieval in full text information systems
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Topic segmentation with an aspect hidden Markov model
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Algorithms
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Correlating multilingual documents via bipartite graph modeling
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Discourse Segmentation in Aid of Document Summarization
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 3 - Volume 3
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation with multiple surface linguistic cues
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Optimal multi-paragraph text segmentation by dynamic programming
ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Statistical models for topic segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Extracting shared topics of multiple documents
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Thread detection in dynamic text message streams
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-task text segmentation and alignment based on weighted mutual information
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic passage segmentation based on sentence topics for question answering
Information Sciences: an International Journal
Topic segmentation with shared topic detection and alignment of multiple documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS
Applied Artificial Intelligence
Text segmentation with LDA-based Fisher kernel
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Context-based message expansion for disentanglement of interleaved text conversations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient linear text segmentation based on information retrieval techniques
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Word distribution based methods for minimizing segment overlaps
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
A dynamic programming model for text segmentation based on min-max similarity
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Evaluating hierarchical discourse segmentation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved latent concept expansion using hierarchical markov random fields
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Multi-document topic segmentation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Coverage-based methods for distributional stopword selection in text segmentation
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Improving text segmentation with non-systematic semantic relation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Segmenting eBay item descriptions into coherent sections
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
An automatic approach for efficient text segmentation
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
This paper presents a novel domain-independent text segmentation method, which identifies the boundaries of topic changes in long text documents and/or text streams. The method consists of three components: As a preprocessing step, we eliminate the document-dependent stop words as well as the generic stop words before the sentence similarity is computed. This step assists in the discrimination of the sentence semantic information. Then the cohesion information of sentences in a document or a text stream is captured with a sentence-distance matrix with each entry corresponding to the similarity between a sentence pair. The distance matrix can be represented with a gray-scale image. Thus, a text segmentation problem is converted into an image segmentation problem. We apply the anisotropic diffusion technique to the image representation of the distance matrix to enhance the semantic cohesion of sentence topical groups as well as sharpen topical boundaries. At last, the dynamic programming technique is adapted to find the optimal topical boundaries and provide a zoom-in and zoom-out mechanism for topics access by segmenting text in variable numbers of sentence topical groups. Our approach involves no domain-specific training, and it can be applied to texts in a variety of domains. The experimental results show that our approach is effective in text segmentation and outperforms several state-of-the-art methods.