Methods for exploring and mining tables on Wikipedia

Authors:
Chandra Sekhar Bhagavatula;Thanapon Noraset;Doug Downey
Affiliations:
Northwestern University;Northwestern University;Northwestern University
Venue:
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Year:
2013

Citing 21
Cited 1

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Linear feature-based models for information retrieval

Information Retrieval
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration

IEEE Transactions on Visualization and Computer Graphics
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
TextRunner: open information extraction on the web

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Answering table augmentation queries from unstructured lists on the web

Proceedings of the VLDB Endowment
Open information extraction for the web

Open information extraction for the web
Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality

Proceedings of the fourth ACM international conference on Web search and data mining
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding related tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Explanatory semantic relatedness and explicit spatialization for exploratory search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
KORE: keyphrase overlap relatedness for entity disambiguation

Proceedings of the 21st ACM international conference on Information and knowledge management

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover "interesting" relationships between table columns. We find that a "Semantic Relatedness" measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.