Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
The Journal of Machine Learning Research
Bayesian query-focused summarization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Noise reduction through summarization for Web-page classification
Information Processing and Management: an International Journal
Exploring content models for multi-document summarization
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Journal of the ACM (JACM)
Hi-index | 0.00 |
Web document summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the summarization objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Dirichlet Distribution Model can break down these documents into different sentence or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Centroid Value Decomposition is used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal in our proposed DDCM. Thus using DDM we get the different sentence in the document and using Centroid Model we find the words that best represent these sentences. The goal of this paper is to find minimum number of highly qualitative features by generating best summarization for web document classification. We conducted experiments with various Centroid based numbers of summarization approaches and obtain effective classification results. Experimental results show that our proposed DDCM summarization based classification approach achieved more accurate and improved result as compared to full text based classification.