Building re-usable dictionary repositories for real-world text mining

Authors:
Shantanu Godbole;Indrajit Bhattacharya;Ajay Gupta;Ashish Verma
Affiliations:
IBM Research - India, New Delhi, India;Indian Institute of Science, Bangalore, India;IBM Research - India, New Delhi, India;IBM Research - India, New Delhi, India
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 12
Cited 1

Learning to learn

Learning to learn
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning for information extraction: from named entity recognition and disambiguation to relation extraction

Learning for information extraction: from named entity recognition and disambiguation to relation extraction
Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Getting insights from the voices of customers: Conversation mining at a contact center

Information Sciences: an International Journal
Enabling analysts in managed services for CRM analytics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge transformation for cross-domain sentiment classification

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Transfer learning from minimal target data by mapping across relational domains

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Automated dictionary discovery for the online marketplace

Proceedings of the 2012 iConference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text mining, though still a nascent industry, has been growing quickly along with the awareness of the importance of unstructured data in business analytics, customer retention and extension, social media, and legal applications. There has been a recent increase in the number of commercial text mining product and service offerings, but successful or wide-spread deployments are rare, mainly due to a dependence on the expertise and skill of practitioners. Accordingly, there is a growing need for re-usable repositories for text mining. In this paper, we focus on dictionary-based text mining and its role in enabling practitioners in understanding and analyzing large text datasets. We motivate and define the problem of exploratory dictionary construction for capturing concepts of interest, and propose a framework for efficient construction, tuning, and re-use of these dictionaries across datasets. The construction framework offers a range of interaction modes to the user to quickly build concept dictionaries over large datasets. We also show how to adapt one or more dictionaries across domains and tasks, thereby enabling reuse of knowledge and effort in industrial practice. We present results and case studies on real-life CRM analytics datasets, where such repositories and tooling significantly cut down practitioner time and effort for dictionary-based text mining.