The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The Throughput of Sequential Testing
Proceedings of the 8th International IPCO Conference on Integer Programming and Combinatorial Optimization
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive ordering of pipelined stream filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Flow algorithms for two pipelined filter ordering problems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Query optimization over web services
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Collective information extraction with relational Markov networks
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Language-Independent Set Expansion of Named Entities Using the Web
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Iterative Set Expansion of Named Entities Using the Web
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
Data integration for the relational web
Proceedings of the VLDB Endowment
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Google fusion tables: web-centered data management and collaboration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Schema Matching and Mapping
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Public data integration with WebSmatch
Proceedings of the First International Workshop on Open Data
Extracting information from google fusion tables
Search Computing
InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A bottom-up, knowledge-aware approach to integrating and querying web data services
ACM Transactions on the Web (TWEB)
Methods for exploring and mining tables on Wikipedia
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Towards generic framework for tabular data extraction and management in documents
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment
Synthesizing union tables from the web
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We consider the problem of finding related tables in a large corpus of heterogenous tables. Detecting related tables provides users a powerful tool for enhancing their tables with additional data and enables effective reuse of available public data. Our first contribution is a framework that captures several types of relatedness, including tables that are candidates for joins and tables that are candidates for union. Our second contribution is a set of algorithms for detecting related tables that can be either unioned or joined. We describe a set of experiments that demonstrate that our algorithms produce highly related tables. We also show that we can often improve the results of table search by pulling up tables that are ranked much lower based on their relatedness to top-ranked tables. Finally, we describe how to scale up our algorithms and show the results of running it on a corpus of over a million tables extracted from Wikipedia.