Analogical natural language processing
Analogical natural language processing
Issues in text-based lexicon acquisition
Corpus processing for lexical acquisition
Corpus processing for lexical acquisition
The String-to-String Correction Problem
Journal of the ACM (JACM)
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word sense disambiguation using Conceptual Density
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Inherited Feature-based Similarity Measure based on large semantic hierarchy and large text corpus
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Hi-index | 0.00 |
Through the alignment of definitions from two or more different sources, it is possible to retrieve pairs of words that can be used indistinguishably in the same sentence without changing the meaning of the concept. As lexicographic work exploits common defining schemes, such as genus and differentia, a concept is similarly defined by different dictionaries. The difference in words used between two lexicographic sources lets us extend the lexical knowledge base, so that clustering is available through merging two or more dictionaries into a single database and then using an appropriate alignment technique. Since alignment starts from the same entry of two dictionaries, clustering is faster than any other technique.The algorithm introduced here is analogy-based, and starts from calculating the Levenshtein distance, which is a variation of the edit distance, and allows us to align the definitions. As a measure of similarity, the concept of longest collocation couple is introduced, which is the basis of clustering similar words. The process iterates, replacing similar pairs of words in the definitions until no new clusters are found.