Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Proceedings of the 10th international conference on World Wide Web
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Athena: Mining-Based Interactive Management of Text Database
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Evolving Extended Naive Bayes Classifiers
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Raising the baseline for high-precision text classifiers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Intelligent Data Analysis
A unified approach for schema matching, coreference and canonicalization
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving the performance of focused web crawlers
Data & Knowledge Engineering
Proceedings of the 18th ACM conference on Information and knowledge management
Foundations and Trends in Information Retrieval
Sequential update of Bayesian network structure
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
With vertical search engines, it is possible to search the web pages on a specific domain such as products, restaurants or academic papers and present the users only the interested information. Gathering and integrating such objects from multiple web pages into a single system provides a useful facility for users. Placing the extracted objects from multiple data sources into a single hierarchical structure is a challenging classification problem, especially if there are limited object attributes. In this work, we propose a confidence-based incremental Naïve Bayesian approach for categorization, focusing on the product domain. Incremental approach is based on extending the training set and retraining the classifier as new objects are assigned to a category with high confidence. The ordering of product data is taken into account as well. The proposed approach is applied on a vertical search engine that collects product data from several online stores.