The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Category mapping for the automatic integration of category-constrained web search
International Journal of Business Intelligence and Data Mining
Extraction of unexpected sentences: A sentiment classification assessed approach
Intelligent Data Analysis
TODWEB: training-less ontology based deep web source classification
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We present a method that automatically classifies structured deep Web databases according to a pre-defined topic hierarchy. We assume that there are some manually classified databases, i.e., training databases, in every node of the topic hierarchy. Each training database is probed using queries constructed from the node titles of the topic hierarchy and the query result counts reported by the database are used to represent the content of the database. Hence, when adding a new database it can be probed by the same set of queries and classified to a node whose training databases are most similar to the new one. Specifically, a support vector machine classifier is trained on each internal node of the topic hierarchy with these training databases and the new database can be classified into the hierarchy top-down level by level. A feature extension method is proposed to create discriminant features. Experiments run on real structured Web databases collected from the Internet show that this classification method is quite accurate.