WebPut: efficient web-based data imputation

Authors:
Zhixu Li;Mohamed A. Sharaf;Laurianne Sitbon;Shazia Sadiq;Marta Indulska;Xiaofang Zhou
Affiliations:
The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia;Queensland University of Technology, QLD, Australia;The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia
Venue:
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Year:
2012

Citing 14
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Robust Learning with Missing Data

Machine Learning
A Comparison of Several Approaches to Missing Attribute Values in Data Mining

RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Using Association Rules for Completing Missing Data

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Answering table augmentation queries from unstructured lists on the web

Proceedings of the VLDB Endowment
Automatic set instance extraction using the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Missing Value Estimation for Mixed-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Shell-neighbor method and its application in missing data imputation

Applied Intelligence
Learning-based relevance feedback for web-based relation completion

Proceedings of the 20th ACM international conference on Information and knowledge management

Data centric research at the University of Queensland

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.