Approximation algorithms for the metric labeling problem via a new linear programming formulation
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Fast Approximate Energy Minimization via Graph Cuts
IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Convergent Tree-Reweighted Message Passing for Energy Minimization
IEEE Transactions on Pattern Analysis and Machine Intelligence
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Data integration for the relational web
Proceedings of the VLDB Endowment
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
Expressive and flexible access to web-extracted data: a keyword-based structured query language
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Structured annotations of web queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Web-scale table census and classification
Proceedings of the fourth ACM international conference on Web search and data mining
A comparative study of energy minimization methods for markov random fields
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data-based research at IIT Bombay
ACM SIGMOD Record
A human-machine method for web table understanding
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Scalable column concept determination for web tables using large knowledge bases
Proceedings of the VLDB Endowment
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1,..., Tn, and a query Q with q sets of keywords Q1,..., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.