Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Intelligent integration of information
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Proceedings of the third annual conference on Autonomous Agents
Duplicate record elimination in large data files
ACM Transactions on Database Systems (TODS)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Query Learning Strategies Using Boosting and Bagging
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Duplicate Removal in Information System Dissemination
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Identifying and Merging Related Bibliographic Records
Identifying and Merging Related Bibliographic Records
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Retrieving and Semantically Integrating Heterogeneous Data from the Web
IEEE Intelligent Systems
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Object-level ranking: bringing order to Web objects
WWW '05 Proceedings of the 14th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A hierarchical naive Bayes mixture model for name disambiguation in author citations
Proceedings of the 2005 ACM symposium on Applied computing
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Composing, optimizing, and executing plans for bioinformatics web services
The VLDB Journal — The International Journal on Very Large Data Bases
Automatically utilizing secondary sources to align information across sources
AI Magazine - Special issue on semantic integration
Semantic integration in text: from ambiguous names to identifiable entities
AI Magazine - Special issue on semantic integration
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Heterogeneous Field Matching Method for Record Linkage
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 16th international conference on World Wide Web
Integration of Ontology Data through Learning Instance Matching
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
A novel approach to clustering merchandise records
Journal of Computer Science and Technology
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Identification of time-varying objects on the web
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Deploying information agents on the web
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
A constrained clustering approach to duplicate detection among relational data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Generation and matching of ontology data for the semantic web in a peer-to-peer framework
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Self-tuning in graph-based reference disambiguation
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Automatic training example selection for scalable unsupervised record linkage
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Efficient entity resolution for large heterogeneous information spaces
Proceedings of the fourth ACM international conference on Web search and data mining
Approximate entity extraction in temporal databases
World Wide Web
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A supervised machine learning approach for duplicate detection over gazetteer records
GeoS'11 Proceedings of the 4th international conference on GeoSpatial semantics
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
Privacy preserving group linkage
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Learning top-k transformation rules
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Automatically generating data linkages using a domain-independent candidate selection approach
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data
Proceedings of the fifth ACM international conference on Web search and data mining
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Extracting mnemonic names of people from the web
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Similarity function recommender service using incremental user knowledge acquisition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
SC spectra: a linear-time soft cardinality approximation for text comparison
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Learning to adapt cross language information extraction wrapper
Applied Intelligence
Learning expressive linkage rules using genetic programming
Proceedings of the VLDB Endowment
Journal of Biomedical Informatics
Classifying and ranking: the first step towards mining inside vertical search engines
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Integrating feature analysis and background knowledge to recommend similarity functions
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Proceedings of the sixth ACM international conference on Web search and data mining
Deep Web Information Retrieval Process: A Technical Survey
International Journal of Information Technology and Web Engineering
Tuning large scale deduplication with reduced effort
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Learning an accurate entity resolution model from crowdsourced labels
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Active learning of expressive linkage rules using genetic programming
Web Semantics: Science, Services and Agents on the World Wide Web
Toward detection of aliases without string similarity
Information Sciences: an International Journal
Hi-index | 0.00 |
The task of object identification occurs when integrating information from multiple websites. The same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. Previous methods of object identification have required manual construction of domain-specific string transformations or manual setting of general transformation parameter weights for recognizing format inconsistencies. This manual process can be time consuming and error-prone. We have developed an object identification system called Active Atlas [18], which applies a set of domain-independent string transformations to compare the objects' shared attributes in order to identify matching objects. In this paper, we discuss extensions to the Active Atlas system, which allow it to learn to tailor the weights of a set of general transformations to a specific application domain through limited user input. The experimental results demonstrate that this approach achieves higher accuracy and requires less user involvement than previous methods across various application domains.