Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A cooccurrence-based thesaurus and two applications to information retrieval
Information Processing and Management: an International Journal
Thesaurus construction: problems and their roots
Information Processing and Management: an International Journal
Query expansion using heterogeneous thesauri
Information Processing and Management: an International Journal
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hebrew Computational Linguistics: Past and Future
Artificial Intelligence Review
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Finding parts in very large corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Characterising measures of lexical distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A review of ontology based query expansion
Information Processing and Management: an International Journal
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Dependency-Based Construction of Semantic Space Models
Computational Linguistics
Automatic thesaurus construction
ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
Introduction to Information Retrieval
Introduction to Information Retrieval
Combined one sense disambiguation of abbreviations
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A study on similarity and relatedness using distributional and WordNet-based approaches
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic thesaurus construction for spam filtering using revised back propagation neural network
Expert Systems with Applications: An International Journal
Directional distributional similarity for lexical expansion
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Acquiring thesauri from wikis by exploiting domain models and lexical substitution
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
A web knowledge based approach for complex question answering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
This article describes methods for semiautomatic thesaurus construction, for a cross generation, cross genre, and cross cultural corpus. Semiautomatic thesaurus construction is a complex task, and applying it on a cross generation corpus brings its own challenges. We used a Jewish juristic corpus containing documents and genres that were written across 2000 years, and contain a mix of different languages, dialects, geographies, and writing styles. We evaluated different first and second order methods, and introduced a special annotation scheme for this problem, which showed that first order methods performed surprisingly well. We found that in our case, improving the coverage is the more difficult task, for this we introduce a new algorithm to increase recall (coverage)—which is applicable to many other problems as well, and demonstrates significant improvement in our corpus.