A human-machine method for web table understanding

Authors:
Guoliang Li
Affiliations:
Department of Computer Science, Tsinghua Univeristy, Beijing, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 29
Cited 0

Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Efficient interactive fuzzy keyword search

Proceedings of the 18th international conference on World wide web
Efficient type-ahead search on relational data: a TASTIER approach

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Harvesting relational tables from lists on the web

Proceedings of the VLDB Endowment
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Trie-join: efficient trie-based string similarity joins with edit-distance constraints

Proceedings of the VLDB Endowment
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web

The VLDB Journal — The International Journal on Very Large Data Bases
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Fast-join: An efficient method for fuzzy token matching based string similarity join

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Interactive SQL query suggestion: Making databases user-friendly

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Efficient fuzzy full-text type-ahead search

The VLDB Journal — The International Journal on Very Large Data Bases
Human-powered sorts and joins

Proceedings of the VLDB Endowment
Pass-join: a partition-based method for similarity joins

Proceedings of the VLDB Endowment
Efficient Fuzzy Type-Ahead Search in XML Data

IEEE Transactions on Knowledge and Data Engineering
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CrowdScreen: algorithms for filtering data with humans

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Answering table queries on the web using column keywords

Proceedings of the VLDB Endowment
CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment
Supporting efficient top-k queries in type-ahead search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Leveraging transitive relations for crowdsourced joins

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Top-k string similarity search with edit-distance constraints

ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. Table understanding is to identify, recognize and interpret tabular structures to enable a variety of tasks such as data extraction, data interpretation, data integration, and search and analysis. In this paper, we propose a human-machine hybrid method for effectively understanding tables on the Web. We develop novel techniques to address four main problems in Web table understanding: Web table extraction, Web table interpretation, Web table integration, and Web table search and analysis. We also discuss some open problems that need more research investigation in Web table understanding. We believe that Web table management will attract much more attention in the coming years.