Selective integration of background knowledge in TCBR systems

Authors:
Anil Patelia;Sutanu Chakraborti;Nirmalie Wiratunga
Affiliations:
Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India;Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK
Venue:
ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Year:
2011

Citing 6
Cited 0

Using LSI for text classification in the presence of background text

Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A propositional approach to textual case indexing

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Sprinkling: supervised latent semantic indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores how background knowledge from freely available web resources can be utilised for Textual Case Based Reasoning. The work reported here extends the existing Explicit Semantic Analysis approach to representation, where textual content is represented using concepts with correspondence to Wikipedia articles. We present approaches to identify Wikipedia pages that are likely to contribute to the effectiveness of text classification tasks. We also study the effect of modelling semantic similarity between concepts (amounting to Wikipedia articles) empirically. We conclude with the observation that integrating background knowledge from resources like Wikipedia into TCBR tasks holds a lot of promise as it can improve system effectiveness even without elaborate manual knowledge engineering. Significant performance gains are obtained using a very small number of features that have very strong correspondence to how humans describe the domain.