Automatically building probabilistic databases from the web

Authors:
Lorenzo Blanco;Mirko Bronzi;Valter Crescenzi;Paolo Merialdo;Paolo Papotti
Affiliations:
Università Roma Tre, Roma, Italy;Università Roma Tre, Roma, Italy;Università Roma Tre, Roma, Italy;Università Roma Tre, Roma, Italy;Università Roma Tre, Roma, Italy
Venue:
Proceedings of the 20th international conference companion on World wide web
Year:
2011

Citing 7
Cited 1

WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Supporting the automatic construction of entity aware search engines

Proceedings of the 10th ACM workshop on Web information and data management
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases
Probabilistic models to reconcile complex data from inaccurate data sources

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Global detection of complex copying relationships between sources

Proceedings of the VLDB Endowment

Towards discovering conceptual models behind web sites

ER'12 Proceedings of the 31st international conference on Conceptual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.