Annotating and searching web tables using entities, types and relationships

Authors:
Girija Limaye;Sunita Sarawagi;Soumen Chakrabarti
Affiliations:
IIT Bombay, India;IIT Bombay, India;IIT Bombay, India
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 16
Cited 29

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Proceedings of the 15th international conference on World Wide Web
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Answering table augmentation queries from unstructured lists on the web

Proceedings of the VLDB Endowment
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Structured data on the web

Communications of the ACM
Web data management

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
DC proposal: graphical models and probabilistic reasoning for generating linked data from tables

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding related tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
LIEGE:: link entities in web lists with knowledge base

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
RUBIX: a framework for improving data integration with linked data

Proceedings of the First International Workshop on Open Data
Controlled knowledge base enrichment from web documents

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A domain independent framework for extracting linked semantic data from tables

Search Computing
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Entity discovery and annotation in tables

Proceedings of the 16th International Conference on Extending Database Technology
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data-based research at IIT Bombay

ACM SIGMOD Record
Discovering semantic relations from the web and organizing them with PATTY

ACM SIGMOD Record
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The parallel path framework for entity discovery on the web

ACM Transactions on the Web (TWEB)
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)
A human-machine method for web table understanding

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Scalable column concept determination for web tables using large knowledge bases

Proceedings of the VLDB Endowment
Schema extraction for tabular data on the web

Proceedings of the VLDB Endowment
Web table taxonomy and formalization

ACM SIGMOD Record
Synthesizing union tables from the web

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Using linked data to mine RDF from wikipedia's tables

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.02

Visualization

Abstract

Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.