SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
A Machine Learning Approach to POS Tagging
Machine Learning
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Toward general-purpose learning for information extraction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Web wrapper induction: a brief survey
AI Communications
Rule Learning with Probabilistic Smoothing
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
Classification based on specific rules and inexact coverage
Expert Systems with Applications: An International Journal
CAR-NF: A classifier based on specific rules with high netconf
Intelligent Data Analysis
Hi-index | 0.00 |
A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of the previous steps. Classification is accomplished by an ad hoc exploitation of traditional association mining algorithms, and is supported by a data transformation scheme which takes advantage of domain-specific dictionaries/ontologies. A key feature is the capability of progressively enriching the available ontology with the results of the previous stages of classification, thus significantly improving the overall classification accuracy. An extensive experimental evaluation shows the effectiveness of our approach.