Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
Reasoning for web document associations and its applications in site map construction
Data & Knowledge Engineering
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Adaptive View Validation: A First Step Towards Automatic View Detection
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Active + Semi-supervised Learning = Robust Multi-View Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Combining clustering and co-training to enhance text classification using unlabelled data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Email classification with co-training
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Web unit mining: finding and classifying subgraphs of web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Stylistic and lexical co-training for web block classification
Proceedings of the 6th annual ACM international workshop on Web information and data management
Fast webpage classification using URL features
Proceedings of the 14th ACM international conference on Information and knowledge management
Web unit-based mining of homepage relationships
Journal of the American Society for Information Science and Technology
Mutually beneficial learning with application to on-line news classification
Proceedings of the ACM first Ph.D. workshop in CIKM
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A collaborative ability measurement for co-training
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Semi-supervised learning combining co-training with active learning
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physical or abstract entity, e.g., company, people, and event. Furthermore, users often like to organize pages into conceptual categories for better search and retrieval, making it feasible to extract relevant attributes and relationships from the web. Given a set of entities each consisting of a set of web pages, we name the task of assigning pages to the corresponding conceptual categories conceptual web classification. To address this, we propose an entity-based co-training (EcT) algorithm which learns from the unlabeled examples to boost its performance. Different from existing co-training algorithms, EcT has taken into account the entity semantics hidden in web pages and requires no prior knowledge about the underlying class distribution which is crucial in standard co-training algorithms used in web classification. In our experiments, we evaluated EcT, standard co-training, and other three non co-training learning methods on Conf-425 dataset. Both EcT and co-training performed well when compared to the baseline methods that required large amount of training examples.