Automated cleansing for spend analytics

Authors:
Moninder Singh;Jayant R. Kalagnanam;Sudhir Verma;Amit J. Shah;Swaroop K. Chalasani
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 7
Cited 0

Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
Relational learning techniques for natural language information extraction

Relational learning techniques for natural language information extraction
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of an aggregate view of the procurement spend across an enterprise using transactional data is increasingly becoming a very important and strategic activity. Not only does it provide a complete and accurate picture of what the enterprise is buying and from whom, it also allows it to consolidate suppliers, as well as negotiate better prices. The importance, as well as the complexity, of this cleansing exercise is further magnified by the increasing popularity of Business Transformation Outsourcing (BTO) wherein enterprises are turning over non-core activities, such as indirect procurement, to third parties, who now need to develop an integrated view of spend across multiple enterprises in order to optimize procurement and generate maximum savings. However, the creation of such an integrated view of procurement spend requires the creation of a homogeneous data repository from disparate (heterogeneous) data sources across various geographic and functional organizations throughout the enterprise(s). Such repositories get transactional data from various sources such as invoices, purchase orders, account ledgers. As such, the transactions are not cross-indexed, refer to the same suppliers by different names, and use different ways of representing information about the same commodities. Before an aggregated spend view can be developed, this data needs to be cleansed, primarily to normalize the supplier names and correctly map each transaction to the appropriate commodity code. Commodity mapping, in particular, is made more difficult by the fact that it has to be done on the basis of unstructured text descriptions found in the various data sources. We describe an on-demand system to automatically perform this cleansing activity using techniques from information retrieval and machine learning. Built on standard integration and application infrastructure software, this system provides enterprises with a fast, reliable, accurate and on-demand way of cleansing transactional data and generating an integrated view of spend. This system is currently in the process of being deployed by IBM for use in its BTO practice.