Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting hierarchical domain structure to compute similarity
ACM Transactions on Information Systems (TOIS)
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
WISE-cluster: clustering e-commerce search engines automatically
Proceedings of the 6th annual ACM international workshop on Web information and data management
Structured databases on the web: observations and implications
ACM SIGMOD Record
Merging Interface Schemas on the Deep Web via Clustering Aggregation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Merging Source Query Interfaces onWeb Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data management projects at Google
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Classification-aware hidden-web text database selection
ACM Transactions on Information Systems (TOIS)
Document similarity based on concept tree distance
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Clustering structured web sources: a schema-based, model-differentiation approach
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hi-index | 0.00 |
The amount of high-quality data in the Web databases has been increasing dramatically. To utilize such wealth of information, measuring the similarity betweenWeb databases has been proposed for many applications, such as clustering and top-k recommendation. Most of the existing methods use the text information either in the interfaces of Web databases or in the Web pages where the interfaces are located, to represent the Web databases. These methods have the limitation that the text may contain a lot of noisy words, which are rarely discriminative and cannot capture the characteristics of the Web databases. To better measure the similarity between Web databases, we introduce a novel Web database similarity method. We employ the categories of the records in the Web databases, which can be automatically extracted from the Web sites where the Web databases are located, to represent the Web databases. The record categories are of high-quality and can capture the characteristics of the corresponding Web databases effectively. In order to better utilize the record categories, we measure the similarity between Web databases based on a unified category hierarchy, and propose an effective method to construct the category hierarchy from the record categories obtained from all the Web databases. We conducted experiments on real ChineseWeb Databases to evaluate our method. The results show that our method is effective in clustering and top-k recommendation for Web Databases, compared with the baseline method, and can be used in real Web database related applications.