Web classification of conceptual entities using co-training

Authors:
Aixin Sun;Ying Liu;Ee-Peng Lim
Affiliations:
School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore;Department of Industrial and Systems Engineering, Hong Kong Polytechnic University, Hong Kong Special Administrative Region;School of Information Systems, Singapore Management University, Stamford Road, Singapore 178902, Singapore
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 24
Cited 1

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Reasoning for web document associations and its applications in site map construction

Data & Knowledge Engineering
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Adaptive View Validation: A First Step Towards Automatic View Detection

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Combining clustering and co-training to enhance text classification using unlabelled data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Building a web thesaurus from web link structure

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Web unit mining: finding and classifying subgraphs of web pages

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Stylistic and lexical co-training for web block classification

Proceedings of the 6th annual ACM international workshop on Web information and data management
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Web unit-based mining of homepage relationships

Journal of the American Society for Information Science and Technology
Mutually beneficial learning with application to on-line news classification

Proceedings of the ACM first Ph.D. workshop in CIKM
Active learning with strong and weak views: a case study on wrapper induction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A collaborative ability measurement for co-training

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Semi-supervised learning combining co-training with active learning

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physical or abstract entity, e.g., company, people, and event. Furthermore, users often like to organize pages into conceptual categories for better search and retrieval, making it feasible to extract relevant attributes and relationships from the web. Given a set of entities each consisting of a set of web pages, we name the task of assigning pages to the corresponding conceptual categories conceptual web classification. To address this, we propose an entity-based co-training (EcT) algorithm which learns from the unlabeled examples to boost its performance. Different from existing co-training algorithms, EcT has taken into account the entity semantics hidden in web pages and requires no prior knowledge about the underlying class distribution which is crucial in standard co-training algorithms used in web classification. In our experiments, we evaluated EcT, standard co-training, and other three non co-training learning methods on Conf-425 dataset. Both EcT and co-training performed well when compared to the baseline methods that required large amount of training examples.