Joke retrieval: recognizing the same joke told differently

Authors:
Lisa Friedland;James Allan
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 18
Cited 4

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
A statistical approach to machine translation

Computational Linguistics
Understanding jokes: a neural approach to content-based information retrieval

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
DEADLINER: building a new niche search engine

Proceedings of the ninth international conference on Information and knowledge management
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Eigentaste: A Constant Time Collaborative Filtering Algorithm

Information Retrieval
The Interspace: Concept Navigation Across Distributed Communities

Computer
Linguistic knowledge can improve information retrieval

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Conceptual Indexing: A Better Way to Organize Knowledge

Conceptual Indexing: A Better Way to Organize Knowledge
Computational Humor

IEEE Intelligent Systems
Technologies That Make You Smile: Adding Humor to Text-Based Applications

IEEE Intelligent Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Exploiting Semantic Association To Answer 'Vague Queries'

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006

MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching

Proceedings of the 18th ACM conference on Information and knowledge management
Humor as circuits in semantic networks

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Jokestega: automatic joke generation-based steganography methodology

International Journal of Security and Networks
Folktale classification using learning to rank

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold with an entirely different vocabulary while still maintaining its identity. Since most retrieval systems consider documents to be related only when their word content is similar, we propose joke retrieval as a domain where standard language models may fail. Other meaning-centric domains include logic puzzles, proverbs and recipes; in such domains, new techniques may be required to enable us to search effectively. For jokes, a necessary component of any retrieval system will be the ability to identify the "same joke," so we examine this task in both ranking and classification settings. We exploit the structure of jokes to develop two domain-specific alternatives to the "bag of words" document model. In one, only the punch lines, or final sentences, are compared; in the second, certain categories of words (e.g., professions and countries) are tagged and treated as interchangeable. Each technique works well for certain jokes. By combining the methods using machine learning, we create a hybrid that achieves higher performance than any individual approach.