Domain-independent classification for deep web interfaces

Authors:
Yingjun Li;Siwei Wang;Derong Shen;Tiezheng Nie;Ge Yu
Affiliations:
College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China
Venue:
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Year:
2010

Citing 5
Cited 0

Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

World Wide Web
Clustering e-commerce search engines based on their search interface pages using WISE-cluster

Data & Knowledge Engineering - Special issue: WIDM 2004
Combining classifiers to identify online databases

Proceedings of the 16th international conference on World Wide Web
Automatic wrapper induction from hidden-web sources with domain knowledge

Proceedings of the 10th ACM workshop on Web information and data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independent is required since the domains of the huge scale of deep web are hard to predefine. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by applying FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.