Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Scalable Topic-Based Open Source Search Engine
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Applying discrete PCA in data analysis
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Inference and evaluation of the multinomial mixture model for text clustering
Information Processing and Management: an International Journal
Exponentiated gradient algorithms for log-linear structured prediction
Proceedings of the 24th international conference on Machine learning
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Style & topic language model adaptation using HMM-LDA
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Text segmentation via topic modeling: an analytical study
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic evaluation of topic coherence
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Dynamic concept ontology construction for pubmed queries
DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
Text segmentation: A topic modeling perspective
Information Processing and Management: an International Journal
Handling data sparsity in collaborative filtering using emotion and semantic based features
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
TV news story segmentation based on semantic coherence and content similarity
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Semantic based adaptive movie summarisation
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Addressing cold-start in app recommendation: latent user models constructed from twitter followers
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Unsupervised text segmentation using LDA and MCMC
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and a 'randomly generated' false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false document. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications.