A bit-string longest-common-subsequence algorithm
Information Processing Letters
A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
On learning the past tenses of English verbs
Parallel distributed processing: explorations in the microstructure of cognition, vol. 2
Word association norms, mutual information, and lexicography
Computational Linguistics
Class-based n-gram models of natural language
Computational Linguistics
Automated resolution of semantic heterogeneity in multidatabases
ACM Transactions on Database Systems (TODS)
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
WordNet: a lexical database for English
Communications of the ACM
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems (TOIS)
Contextual correlates of synonymy
Communications of the ACM
Matching records in a national medical patient index
Communications of the ACM
Handbook of Natural Language Processing
Handbook of Natural Language Processing
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using Schema Matching to Simplify Heterogeneous Data Translation
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On schema matching with opaque column names and data values
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Automatic rule induction for unknown-word guessing
Computational Linguistics
Bitext maps and alignment via pattern recognition
Computational Linguistics
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A stochastic finite-state word-segmentation algorithm for Chinese
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
Characterising measures of lexical distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Semantic similarity for detecting recognition errors in automatic speech transcripts
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
WordNet: similarity - measuring the relatedness of concepts
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
N-gram similarity and distance
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
A comparative analysis of similarity measurement techniques through SimReq framework
Proceedings of the 7th International Conference on Frontiers of Information Technology
Text similarity using google tri-grams
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
In this paper, we present a method for database schema matching: the problem of identifying elements of two given schemas that correspond to each other. Schema matching is useful in e-commerce exchanges, in data integration/warehousing, and in semantic web applications. We first present two corpus-based methods: one method is for determining the semantic similarity of two target words and the other is for automatic word segmentation. Then we present a name-based element-level database schema matching method that exploits both the semantic similarity and the word segmentation methods. Our word similarity method uses pointwise mutual information (PMI) to sort lists of important neighbor words of two target words; the words which are common in both lists are selected and their PMI values are aggregated to calculate the relative similarity score. Our word segmentation method uses corpus type frequency information to choose the type with maximum length and frequency from "desegmented" text. It also uses a modified forward---backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. Finally, we exploit both the semantic similarity and the word segmentation methods in our proposed name-based element-level schema matching method. This method uses a single property (i.e., element name) for schema matching and nevertheless achieves a measure score that is comparable to the methods that use multiple properties (e.g., element name, text description, data instance, context description). Our schema matching method also uses normalized and modified versions of the longest common subsequence string matching algorithm with weight factors to allow for a balanced combination. We validate our methods with experimental studies, the results of which suggest that these methods can be a useful addition to the set of existing methods.