C4.5: programs for machine learning
C4.5: programs for machine learning
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Robust Learning with Missing Data
Machine Learning
A Comparison of Several Approaches to Missing Attribute Values in Data Mining
RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Using Association Rules for Completing Missing Data
HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Automatic set instance extraction using the web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Corpus-based semantic class mining: distributional vs. pattern-based approaches
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Missing Value Estimation for Mixed-Attribute Data Sets
IEEE Transactions on Knowledge and Data Engineering
Shell-neighbor method and its application in missing data imputation
Applied Intelligence
Learning-based relevance feedback for web-based relation completion
Proceedings of the 20th ACM international conference on Information and knowledge management
Data centric research at the University of Queensland
ACM SIGMOD Record
Hi-index | 0.00 |
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.