A rough set approach to classifying web page without negative examples

Authors:
Qiguo Duan;Duoqian Miao;Kaimin Jin
Affiliations:
Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China;Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China;Department of Computer Science and Technology, Tongji University, Shanghai, China and The Key Laboratory of "Embedded System and Service Computing", Ministry of Education, Shanghai, China
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 7
Cited 3

Tolerance approximation spaces

Fundamenta Informaticae - Special issue: rough sets
Rough set approach to incomplete information systems

Information Sciences: an International Journal
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
One-class svms for document classification

The Journal of Machine Learning Research
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
A tolerance rough set approach to clustering web search results

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases

Rough set and ensemble learning based semi-supervised algorithm for text classification

Expert Systems with Applications: An International Journal
A constrained crawling approach and its application to a specialised search engine

International Journal of Information and Communication Technology
Diverse reduct subspaces based co-training for partially labeled data

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of building Web page classifiers using positive and unlabeled examples, and proposes a more principled technique to solving the problem based on tolerance rough set and Support Vector Machine (SVM). It uses tolerance classes to approximate concepts existed in Web pages and enrich the representation of Web pages, draws an initial approximation of negative example. It then iteratively runs SVM to build classifier which maximizes margins to progressively improve the approximation of negative example. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. Experimental results show that the novel method outperforms existing methods significantly.