Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming
ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Text segmentation using reiteration and collocation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A Dynamic Programming Algorithm for Linear Text Segmentation
Journal of Intelligent Information Systems
Topic segmentation with shared topic detection and alignment of multiple documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization
Expert Systems with Applications: An International Journal
Alignment-based surface patterns for factoid question answering systems
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
A supervised learning approach to biological question answering
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Bayesian unsupervised topic segmentation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient linear text segmentation based on information retrieval techniques
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Using conditional random fields for result identification in biomedical abstracts
Integrated Computer-Aided Engineering
Integrated Computer-Aided Engineering
Integrated Computer-Aided Engineering - Data Mining in Engineering
IEEE Transactions on Multimedia
A wavelet-based particle swarm optimization algorithm for digital image watermarking
Integrated Computer-Aided Engineering - Anniversary Volume: Celebrating 20 Years of Excellence
Hi-index | 0.00 |
Linear text segmentation plays an important role in many natural language processing tasks. Many algorithms have been proposed and shown to improve the performance of linear text segmentation. However, the previous studies often suffer from either lower segmentation accuracy or higher computational complexity. Moreover, parameter setting is another critical problem in some algorithms. Although manual assignment is an approach to solve this problem, it may increase the user's burden, and the parameters provided may not always be suitable to reflect the real metadata of a text. In this paper, a hybrid algorithm, TSHAC-DPSO, is proposed to tackle these problems. A novel linear Text Segmentation algorithm based on Hierarchical Agglomerative Clustering TSHAC is proposed to rapidly generate a satisfactory solution without an auxiliary knowledge base, parameter setting, or user involvement; then an efficient evolutional algorithm, Discrete Particle Swarm Optimization DPSO, is adopted to generate the global optimal solution by refining the solution created by TSHAC. TSHAC-DPSO fully utilizes the merits of both algorithms which not only improve the accuracy of linear text segmentation, but also make the execution more efficient and flexible. The experimental results show that TSHAC-DPSO provides comparable segmentation accuracy with several well-known linear text segmentation algorithms.