Labeling data extracted from the web

Authors:
Altigran S. Da Silva;Denilson Barbosa;João M. B. Cavalcanti;Marco A. S. Sevalho
Affiliations:
Universidade Federal do Amazonas, Manaus, AM, Brazi;University of Calgary, Calgary, AB, Canada;Universidade Federal do Amazonas, Manaus, AM, Brazi;Universidade Federal do Amazonas, Manaus, AM, Brazi
Venue:
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Year:
2007

Citing 11
Cited 4

What is this page known for? Computing Web page reputations

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A brief survey of web data extraction tools

ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence

An FCA-based solution for ontology mediation

Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
RUBIX: a framework for improving data integration with linked data

Proceedings of the First International Workshop on Open Data
Extraction and integration of partially overlapping web sources

Proceedings of the VLDB Endowment
Web table taxonomy and formalization

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses.