A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Algorithms for Model-Based Gaussian Hierarchical Clustering
SIAM Journal on Scientific Computing
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
MedMaker: A Mediation System Based on Declarative Specifications
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Clustering categorical data: an approach based on dynamical systems
The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining complex matchings across Web query interfaces
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Schema Matching across Query Interfaces on the Deep Web
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Measuring similarity of chinese web databases based on category hierarchy
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Deep web integrated systems: current achievements and open issues
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases
Journal of Intelligent Information Systems
Hi-index | 0.00 |
The Web has been rapidly “deepened” with the prevalence of databases online On this “deep Web,” numerous sources are structured, providing schema-rich data Their schemas define the object domain and its query capabilities This paper proposes clustering sources by their query schemas, which is critical for enabling both source selection and query mediation, by organizing sources of with similar query capabilities In abstraction, this problem is essentially clustering categorical data (by viewing each query schema as a transaction) Our approach hypothesizes that “homogeneous sources” are characterized by the same hidden generative models for their schemas To find clusters governed by such statistical distributions, we propose a novel objective function, model-differentiation, which employs principled hypothesis testing to maximize statistical heterogeneity among clusters Our evaluation shows that, on clustering the Web query schemas, the model-differentiation function outperforms existing ones with the hierarchical agglomerative clustering algorithm.