A social network-based inference model for validating customer profile data

Authors:
Sung-Hyuk Park;Soon-Young Huh;Wonseok Oh;Sang Pi Han
Affiliations:
College of Business, Korea Advanced Institute of Science and Technology, Seoul, Korea;College of Business, Korea Advanced Institute of Science and Technology, Seoul, Korea;School of Business, Yonsei University, Seoul, Korea;College of Business, City University of Hong Kong, Kowloon Tong, Hong Kong
Venue:
MIS Quarterly
Year:
2012

Citing 21
Cited 0

The impact of poor data quality on the typical enterprise

Communications of the ACM
Improving data warehouse and business information quality: methods for reducing costs and increasing profits

Improving data warehouse and business information quality: methods for reducing costs and increasing profits
An approximate method for generating symmetric random variables

Communications of the ACM
Criticality of data quality as exemplified in two disasters

Information and Management
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Customer relationship management at Harrah's entertainment

Decision making support systems
Structure and evolution of blogspace

Communications of the ACM - The Blogosphere
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
On Data Reliability Assessment in Accounting Information Systems

Information Systems Research
Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming

Management Science
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Knowing-Why About Data Processes and Data Quality

Journal of Management Information Systems
Social ties and their relevance to churn in mobile telecom networks

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Planetary-scale views on a large instant-messaging network

Proceedings of the 17th international conference on World Wide Web
Homophily in MySpace

Journal of the American Society for Information Science and Technology
A Framework for Reconciling Attribute Values from Multiple Data Sources

Management Science
Preprocessing Uncertain User Profile Data: Inferring User's Actual Age from Ages of the User's Neighbors

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Effective Feature Selection on Data with Uncertain Labels

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
User profiling with hierarchical context: an e-Retailer case study

CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
Creating Social Contagion Through Viral Product Design: A Randomized Trial of Peer Influence in Networks

Management Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Drawing from the social and relational perspectives, this study offers an innovative conceptualization and operational approach regarding the validation of self-reported customer demographic data, which has become an essential corporate asset for harnessing business intelligence. Specifically, based on social network and homophily paradigms in which individuals have a natural tendency to associate and interact frequently with others with similar characteristics, we constructed a relational inference model to determine the accuracy of self-administered consumer profiles. In addition, to further enhance the reliability of our model's prediction capability, we employed the entropy mechanism that minimizes potential biases that may arise from a simple probabilistic approach. To empirically validate the accuracy of our inference framework, we obtained and analyzed over 20 million actual call transactions supplied by one of the largest global telecommunication service providers. The results suggest that our social network-based inference model consistently outperforms other competing mechanisms (e.g., weighted average and simple relational classifier) regardless of the criteria choice (e.g., number of call receivers, call duration, and call frequency), with an accuracy rate of approximately 93 percent. Finally, to confirm the generalizability of our findings, we conducted simulation experiments to validate the robustness of the results in response to variations in parameter values and increases in potential noise in the data. We discuss several implications related to business intelligence for both research and practice, and offer new directions for future studies.