Truth finding on the deep web: is the problem solved?

  • Authors:
  • Xian Li;Xin Luna Dong;Kenneth Lyons;Weiyi Meng;Divesh Srivastava

  • Affiliations:
  • SUNY at Binghamton;AT&T Labs-Research;AT&T Labs-Research;SUNY at Binghamton;AT&T Labs-Research

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs. In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people's lives: Stock and Flight. To our surprise, we observed a large amount of inconsistency on data from different sources and also some sources with quite low accuracy. We further applied on these two data sets state-of-the-art data fusion methods that aim at resolving conflicts and finding the truth, analyzed their strengths and limitations, and suggested promising research directions. We wish our study can increase awareness of the seriousness of conflicting data on the Web and in turn inspire more research in our community to tackle this problem.