Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Support vector domain description
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
ACM SIGKDD Explorations Newsletter
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text categorization based on k-nearest neighbor approach for web site classification
Information Processing and Management: an International Journal
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring the factors affecting internet content filters acceptance
ACM SIGecom Exchanges
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
Uniform object generation for optimizing one-class classifiers
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
Web page feature selection and classification using neural networks
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Web page classification without the web page
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Audio enriched links: web page previews for blind users
Assets '04 Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility
Stylistic and lexical co-training for web block classification
Proceedings of the 6th annual ACM international workshop on Web information and data management
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Fast webpage classification using URL features
Proceedings of the 14th ACM international conference on Information and knowledge management
Bias Analysis in Text Classification for Highly Skewed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
WebAngels Filter: A Violent Web Filtering Engine Using Textual and Structural Content-Based Analysis
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
A boosted semi-supervised learning framework for web page filtering
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Web objectionable text content detection using topic modeling technique
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The World Wide Web has now become a humongous archive of various contents. The inordinate amount of information found on the web presents a challenge to deliver right information to the right users. On one hand, the abundant information is freely accessible to all web denizens; on the other hand, much of such information may be irrelevant or even deleterious to some users. For example, some control and filtering mechanisms are desired to prevent inappropriate or offensive materials such as pornographic websites from reaching children. Ways of accessing websites are termed as Access Scenarios. An Access Scenario can include using search engines (e.g., image search that has very little textual content), URL redirection to some websites, or directly typing (porn) website URLs. In this paper we propose a framework to analyze a website from several different aspects or information sources, and generate a classification model aiming to accurately classify such content irrespective of access scenarios. Extensive experiments are performed to evaluate the resulting system, which illustrates the promise of the proposed approach.