Automatic thesaurus construction for cross generation corpus

Authors:
Hadas Zohar;Chaya Liebeskind;Jonathan Schler;Ido Dagan
Affiliations:
Bar-Ilan University, Israel;Bar-Ilan University, Israel;Bar-Ilan University, Israel;Bar-Ilan University, Israel
Venue:
Journal on Computing and Cultural Heritage (JOCCH)
Year:
2013

Citing 24
Cited 0

Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A cooccurrence-based thesaurus and two applications to information retrieval

Information Processing and Management: an International Journal
Thesaurus construction: problems and their roots

Information Processing and Management: an International Journal
Query expansion using heterogeneous thesauri

Information Processing and Management: an International Journal
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hebrew Computational Linguistics: Past and Future

Artificial Intelligence Review
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Scaling context space

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A review of ontology based query expansion

Information Processing and Management: an International Journal
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
Automatic thesaurus construction

ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
Introduction to Information Retrieval

Introduction to Information Retrieval
Combined one sense disambiguation of abbreviations

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Pairwise document similarity in large collections with MapReduce

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic thesaurus construction for spam filtering using revised back propagation neural network

Expert Systems with Applications: An International Journal
Directional distributional similarity for lexical expansion

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Acquiring thesauri from wikis by exploiting domain models and lexical substitution

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
A web knowledge based approach for complex question answering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes methods for semiautomatic thesaurus construction, for a cross generation, cross genre, and cross cultural corpus. Semiautomatic thesaurus construction is a complex task, and applying it on a cross generation corpus brings its own challenges. We used a Jewish juristic corpus containing documents and genres that were written across 2000 years, and contain a mix of different languages, dialects, geographies, and writing styles. We evaluated different first and second order methods, and introduced a special annotation scheme for this problem, which showed that first order methods performed surprisingly well. We found that in our case, improving the coverage is the more difficult task, for this we introduce a new algorithm to increase recall (coverage)—which is applicable to many other problems as well, and demonstrates significant improvement in our corpus.