PEBL: Web Page Classification without Negative Examples

Authors:
Hwanjo Yu;Jiawei Han;Kevin Chen-Chuan Chang
Affiliations:
-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 20
Cited 56

Support-Vector Networks

Machine Learning
Autoassociator-based models for speaker verification

Pattern Recognition Letters
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Support vector domain description

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A classifier for semi-structured documents

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
DEADLINER: building a new niche search engine

Proceedings of the ninth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Advances in Component-Based Face Detection

SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Category Specific Web Search by Learning Query Modifications

SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
One-class svms for document classification

The Journal of Machine Learning Research
Uniform object generation for optimizing one-class classifiers

The Journal of Machine Learning Research
A neural network-based model for paper currency recognition and verification

IEEE Transactions on Neural Networks

General MC: Estimating Boundary of Positive Class from Small Positive Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Blocking objectionable web content by leveraging multiple information sources

ACM SIGKDD Explorations Newsletter
Knowing a web page by the company it keeps

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Automatic web pages categorization with ReliefF and Hidden Naive Bayes

Proceedings of the 2007 ACM symposium on Applied computing
Discovering frequent itemsets by support approximation and itemset clustering

Data & Knowledge Engineering
Kernel-based learning for biomedical relation extraction

Journal of the American Society for Information Science and Technology
Effective spam filtering: A single-class learning and ensemble approach

Decision Support Systems
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Bookmark Category Web Page Classification Using Four Indexing and Clustering Approaches

AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES

Applied Artificial Intelligence
Identifying web spam with user behavior analysis

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Recognizing and Filtering Web Images Based on People's Existence

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
Cool Blog Classification from Positive and Unlabeled Examples

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Active Concept Learning For Ontology Evolution

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Serving Comparative Shopping Links Non-invasively

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Active learning in partially supervised classification

Proceedings of the 18th ACM conference on Information and knowledge management
Extraction of unexpected sentences: A sentiment classification assessed approach

Intelligent Data Analysis
A rough set approach to classifying web page without negative examples

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A tool for web usage mining

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
A novel reliable negative method based on clustering for learning from positive and unlabeled examples

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Disambiguating identity web references using Web 2.0 data and semantics

Web Semantics: Science, Services and Agents on the World Wide Web
The forecasting model based on modified SVRM and PSO penalizing Gaussian noise

Expert Systems with Applications: An International Journal
Rough set and ensemble learning based semi-supervised algorithm for text classification

Expert Systems with Applications: An International Journal
A survey of recent trends in one class classification

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Multi-level log-based relevance feedback scheme for image retrieval

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Labeling negative examples in supervised learning of new gene regulatory connections

CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
A pairwise ranking based approach to learning with positive and unlabeled examples

Proceedings of the 20th ACM international conference on Information and knowledge management
Identifying Web Spam with the Wisdom of the Crowds

ACM Transactions on the Web (TWEB)
Event retrieval in video archives using rough set theory and partially supervised learning

Multimedia Tools and Applications
Query-Based video event definition using rough set theory and high-dimensional representation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
A cost-sensitive technique for positive-example learning supporting content-based product recommendations in B-to-C e-commerce

Decision Support Systems
Intelligent web navigation

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Artificial immune system for illicit content identification in social media

Journal of the American Society for Information Science and Technology
Similarity-based approach for positive and unlabelled learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal
Sampling the Web as Training Data for Text Classification

International Journal of Digital Library Systems
Automatic Item Weight Generation for Pattern Mining and its Application

International Journal of Data Warehousing and Mining
IFME: information filtering by multiple examples with under-sampling in a digital library environment

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Robust network traffic identification with unknown applications

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Researcher homepage classification using unlabeled data

Proceedings of the 22nd international conference on World Wide Web
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems
The parallel path framework for entity discovery on the web

ACM Transactions on the Web (TWEB)
Search by multiple examples

Proceedings of the 7th ACM international conference on Web search and data mining
Diversity measures for one-class classifier ensembles

Neurocomputing
A bagging SVM to learn from positive and unlabeled examples

Pattern Recognition Letters
An analytical framework for event mining in video data

Artificial Intelligence Review
Clustering-based ensembles for one-class classification

Information Sciences: an International Journal
Towards improving the online shopping experience: A client-based platform for post-processing Web search results

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Abstract--Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a 驴homepage驴 classifier, one needs to collect a sample of homepages (positive examples) and a sample of nonhomepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of 驴strong驴 negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.