Elements of information theory
Elements of information theory
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Topic segmentation with an aspect hidden Markov model
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Correlating multilingual documents via bipartite graph modeling
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation with multiple surface linguistic cues
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A generalized maximum entropy approach to bregman co-clustering and matrix approximation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Statistical models for topic segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Multi-way distributional clustering via pairwise interactions
ICML '05 Proceedings of the 22nd international conference on Machine learning
Multi-task text segmentation and alignment based on weighted mutual information
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Extracting shared topics of multiple documents
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining, indexing, and searching for textual chemical molecule information on the web
Proceedings of the 17th international conference on World Wide Web
Automatic image annotation via local multi-label classification
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents
ACM Transactions on Asian Language Information Processing (TALIP)
A Wavelet-Based Model to Recognize High-Quality Topics on Web Forum
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Efficient linear text segmentation based on information retrieval techniques
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Independent informative subgraph mining for graph information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Learning to rank graphs for online similar graph search
Proceedings of the 18th ACM conference on Information and knowledge management
Towards the Extraction of Intelligence about Competitor from the Web
WSKS '09 Proceedings of the 2nd World Summit on the Knowledge Society: Visioning and Engineering the Knowledge Society. A Web Science Perspective
Focused multi-document summarization: human summarization activity vs. automated systems techniques
Journal of Computing Sciences in Colleges
Unsupervised discourse segmentation of documents with inherently parallel structure
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Multi-document topic segmentation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topic detection by topic model induced distance using biased initiation
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents
ACM Transactions on Information Systems (TOIS)
Structural topic model for latent topical structure analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Optimizing temporal topic segmentation for intelligent text visualization
Proceedings of the 2013 international conference on Intelligent user interfaces
Effective automatic image annotation via integrated discriminative and generative models
Information Sciences: an International Journal
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
Topic detection and tracking and topic segmentation play an important role in capturing the local and sequential information of documents. Previous work in this area usually focuses on single documents, although similar multiple documents are available in many domains. In this paper, we introduce a novel unsupervised method for shared topic detection and topic segmentation of multiple similar documents based on mutual information (MI) and weighted mutual information (WMI) that is a combination of MI and term weights. The basic idea is that the optimal segmentation maximizes MI (or WMI). Our approach can detect shared topics among documents. It can find the optimal boundaries in a document, and align segments among documents at the same time. It also can handle single-document segmentation as a special case of the multi-document segmentation and alignment. Our methods can identify and strengthen cue terms that can be used for segmentation and partially remove stop words by using term weights based on entropy learned from multiple documents. Our experimental results show that our algorithm works well for the tasks of single-document segmentation, shared topic detection, and multi-document segmentation. Utilizing information from multiple documents can tremendously improve the performance of topic segmentation, and using WMI is even better than using MI for the multi-document segmentation.