The nature of statistical learning theory
The nature of statistical learning theory
A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Extracting sentence segments for text summarization: a machine learning approach
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Generic text summarization using relevance measure and latent semantic analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Numerical Recipes in C++: the art of scientific computing
Numerical Recipes in C++: the art of scientific computing
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel
IEEE Transactions on Pattern Analysis and Machine Intelligence
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving text categorization using the importance of sentences
Information Processing and Management: an International Journal
Web-page classification through summarization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text summarization using a trainable summarizer and latent semantic analysis
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Web-page summarization using clickthrough data
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Q2C@UST: our winning solution to query classification in KDDCUP 2005
ACM SIGKDD Explorations Newsletter
A text categorization based on summarization technique
RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
Language independent extractive summarization
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
The automatic creation of literature abstracts
IBM Journal of Research and Development
Summarization as feature selection for document categorization on small datasets
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
COMPENDIUM: a text summarization system for generating abstracts of research papers
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Text summarisation in progress: a literature review
Artificial Intelligence Review
Internet public opinion hotspot detection research based on k-means algorithm
ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
Editorial: COMPENDIUM: A text summarization system for generating abstracts of research papers
Data & Knowledge Engineering
Hi-index | 0.00 |
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.