PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to variable and feature selection
The Journal of Machine Learning Research
Web-page classification through summarization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of recursive feature elimination methods
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Poisson naive Bayes for text classification with feature weighting
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Training a naive bayes classifier via the EM algorithm with a class distribution constraint
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Feature Selection for Cancer Classification on Microarray Expression Data
ISDA '08 Proceedings of the 2008 Eighth International Conference on Intelligent Systems Design and Applications - Volume 03
Threshold selection for web-page classification with highly skewed class distribution
Proceedings of the 18th international conference on World wide web
DeSRL: a linear-time semantic role labeling system
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Three naive Bayes approaches for discrimination-free classification
Data Mining and Knowledge Discovery
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
The Internet is home to an ever increasing array of products and services available to the general consumer. This trend has given rise to a unique category of internet search where bargain seekers have conjugated towards deal collection databases. This is caused, in part, because traditional internet search engines do not perform well in this domain. Unfortunately, these deal databases are costly to maintain due to the heavy reliance on human participation in order to populate them. This has lead to an interest in the development of this class of internet search. Our research focuses on leveraging machine learning and natural language processing to develop a semi-supervised Web page classifier specific to this problem. We describe the design of our classifier with respect to the machine learning model chosen and the training features selected. We compare our model's effectiveness in classifying deal versus non-deal Web pages against other popular machine learning models such as decision tree, support vector machines, and neural net. Our results show that our proposed model performed the best given the features that were extracted for model training and testing.