Characterizing the uncertainty of web data: models and experiences

Authors:
Lorenzo Blanco;Valter Crescenzi;Paolo Merialdo;Paolo Papotti
Affiliations:
Università degli Studi Roma Tre, Rome, Italy;Università degli Studi Roma Tre, Rome, Italy;Università degli Studi Roma Tre, Rome, Italy;Università degli Studi Roma Tre, Rome, Italy
Venue:
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Year:
2011

Citing 14
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Does “authority” mean quality? predicting expert quality ratings of Web documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using Probabilistic Information in Data Integration

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Truth Discovery with Multiple Conflicting Information Providers on the Web

IEEE Transactions on Knowledge and Data Engineering
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Integrating conflicting data: the role of source dependence

Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world

Proceedings of the VLDB Endowment
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Exploiting information redundancy to wring out structured data from the web

Proceedings of the 19th international conference on World wide web
Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases
Probabilistic models to reconcile complex data from inaccurate data sources

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Global detection of complex copying relationships between sources

Proceedings of the VLDB Endowment
Data Quality: Concepts, Methodologies and Techniques

Data Quality: Concepts, Methodologies and Techniques

Using statistics, visualization and data mining for monitoring the quality of meta-data in web portals

Information Systems and e-Business Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.