TODWEB: training-less ontology based deep web source classification

Authors:
Umara Noor;Zahid Rashid;Azhar Rauf
Affiliations:
Islamic University (DCS, IIUI), Islamabad, Pakistan;School of Electrical Engineering and Computer Science (SEECS, NUST), Islamabad, Pakistan;University of Peshawar (UOP), Peshawar, Pakistan
Venue:
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Year:
2011

Citing 16
Cited 0

Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Classification of Text Databases Through Query Probing

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Automatic Topic Identification Using Ontology Hierarchy

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications

ACM SIGMOD Record
Why Your Data Won't Mix

Queue - Semi-structured Data
Identifying Document Topics Using the Wikipedia Category Network

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Organizing Structured Deep Web by Clustering Query Interfaces Link Graph

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Research on Automatic Classification for Deep Web Query Interfaces

ISIP '08 Proceedings of the 2008 International Symposiums on Information Processing
Subject-Oriented Classification Based on Scale Probing in the Deep Web

WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Covering the semantic space of tourism: an approach based on modularized ontologies

Proceedings of the 1st Workshop on Context, Information and Ontologies
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia

KSE '09 Proceedings of the 2009 International Conference on Knowledge and Systems Engineering
Automatic hierarchical classification of structured deep web databases

WISE'06 Proceedings of the 7th international conference on Web Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, deep web comprises of a large part of web contents. Because of this large volume of data, the technologies related to deep web have gained larger attention in recent years. Deep web mostly comprises of online domain specific databases, which are accessed by using web query interfaces. These highly relevant domain specific databases are more suitable for satisfying the information needs of the users. In order to make the extraction of relevant information easier, there is a need to classify the deep web databases into subject-specific self-descriptive categories. In this paper we present a novel training-less classification approach TODWEB based on common sense world knowledge (in the form of ontology or any external lexical resource) for the automatic deep web source classification; which will help in building highly scalable, domain focused and efficient semantic information retrieval systems (i.e. metasearch engine and search engine directories). One of the important aspects of this approach is the classification method which is completely training less and uses Wikipedia category network and domain-independent ontologies to analyze the semantics in the meta-information of the deep web sources. The large number of fine grained Wikipedia categories are employed to analyze semantic relatedness among concepts and finally the URL of deep web search source is mapped to the category hierarchy offered by Wikipedia. The experiments conducted on a collection of search sources shows that this approach results in a highly accurate and fine grained classification as compared to existing approaches, nearly identical to the results achieved by manual classification.