Understanding tables on the web

Authors:
Jingjing Wang;Haixun Wang;Zhongyuan Wang;Kenny Q. Zhu
Affiliations:
University of Washington;Microsoft Research Asia, China;Microsoft Research Asia, China;Shanghai Jiao Tong University, China
Venue:
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Year:
2012

Citing 19
Cited 5

TINTIN: a system for retrieval in text tables

DL '97 Proceedings of the second ACM international conference on Digital libraries
A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
Open Mind Common Sense: Knowledge Acquisition from the General Public

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Table extraction using conditional random fields

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web

Proceedings of the VLDB Endowment
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
SEISA: set expansion by iterative similarity aggregation

Proceedings of the 20th international conference on World wide web
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Short text conceptualization using a probabilistic knowledgebase

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Automatic taxonomy construction from keywords

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A system for extracting top-K lists from the web

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Entity discovery and annotation in tables

Proceedings of the 16th International Conference on Extending Database Technology
DeExcelerator: a framework for extracting relational data from partially structured documents

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scalable column concept determination for web tables using large knowledge bases

Proceedings of the VLDB Endowment
Context-dependent conceptualization

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Synthesizing union tables from the web

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web contains a wealth of information, and a key challenge is to make this information machine processable. In this paper, we study how to "understand" HTML tables on the Web, which is one step further from finding the schemas of tables. From 0.3 billion Web documents, we obtain 1.95 billion tables, and 0.5-1% of these contain information of various entities and their properties. We argue that in order for computers to understand these tables, computers must first have a brain --- a general purpose knowledge taxonomy that is comprehensive enough to cover the concepts (of worldly facts) in a human mind. Second, we argue that the process of understanding a table is the process of finding the right position for the table in the knowledge taxonomy. Once a table is associated with a concept in the knowledge taxonomy, it will be automatically linked to all other tables that are associated with the same concept, as well as tables associated with concepts related to this concept. In other words, understanding occurs when computers will understand the semantics of the tables through the interconnections of concepts in the knowledge base. In this paper, we illustrate a two phase process. Our experimental results show that the approach is feasible and it may benefit many useful applications such as web search.