Automatically generating term-frequency-induced taxonomies
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Estimating accuracy for text classification tasks on large unlabeled data
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.