What is this page known for? Computing Web page reputations
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A brief survey of web data extraction tools
ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
An FCA-based solution for ontology mediation
Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
RUBIX: a framework for improving data integration with linked data
Proceedings of the First International Workshop on Open Data
Extraction and integration of partially overlapping web sources
Proceedings of the VLDB Endowment
Web table taxonomy and formalization
ACM SIGMOD Record
Hi-index | 0.00 |
We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses.