Machine Learning-Powered Entity Resolution: A Scalable Approach for Real-Time Global Customer Matching

Authors

  • Veerababu Motamarri Northern Illinois University, USA

DOI:

https://doi.org/10.47941/ijce.2995

Keywords:

Entity Resolution, Machine Learning, Fuzzy Matching, Real-time Data Integration, Customer Identity Management

Abstract

This article presents a comprehensive approach to entity resolution (ER) that addresses the fundamental challenge of accurately unifying customer identities across disparate global data sources in real-time environments. The article introduces a hybrid record linkage system that transcends the limitations of traditional rule-based approaches by combining deterministic blocking with advanced fuzzy matching algorithms and supervised machine learning techniques. The article leverages Apache Spark's distributed processing capabilities alongside VoltDB's in-memory database technology to achieve both the accuracy and performance required for enterprise-scale deployment. Our methodology incorporates TF-IDF vectorization, Jaro-Winkler distance metrics, and logistic regression ensembles to generate calibrated match likelihood scores that enable flexible decision thresholds for different business contexts. Beyond the technical implementation, the article presents a holistic framework addressing the operational challenges of deploying sophisticated matching systems in regulated environments, including data quality monitoring, stakeholder engagement, and governance models that balance algorithmic consistency with business flexibility. Performance optimizations significantly reduced processing times while maintaining high match quality, enabling both efficient batch reconciliation and real-time matching during customer interactions. The system's self-monitoring and continuous learning capabilities have created a platform that evolves with changing data patterns rather than degrading over time. This article serves as both a technical blueprint and a strategic guide for organizations seeking to implement scalable, explainable, and high-performance entity resolution systems in complex, global environments.

Downloads

Download data is not yet available.

References

Peter Christen. “Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection”. Springer Science & Business Media, 05 July 2012. https://doi.org/10.1007/978-3-642-31164-2

Ivan P. Fellegi, Alan Sunter. “A Theory for Record Linkage”. Journal of the American Statistical Association, 64(328), 1183-1210, 10 Apr 2012. https://doi.org/10.1080/01621459.1969.10501049

Peter Christen, Karl Goiser. “Quality and Complexity Measures for Data Linkage and Deduplication”. Quality Measures in Data Mining (pp. 127-151), 2007. Springer. https://doi.org/10.1007/978-3-540-44918-8_6

Qing Wang et al. “Semantic-Aware Blocking for Entity Resolution”. IEEE Transactions on Knowledge and Data Engineering, 28(1), 166-180. 14 August 2015. https://doi.org/10.1109/TKDE.2015.2468711

Lise Getoor, Ashwin Machanavajjhala. “Entity Resolution: Theory, Practice & Open Challenges”. Proceedings of the VLDB Endowment, 5(12), 2018-2019. 01 August 2012 https://doi.org/10.14778/2367502.2367564

M. Stonebraker, Ariel Weisberg. “The VoltDB Main Memory DBMS”. IEEE Data Engineering Bulletin, 36(2), 21-27, 2013. https://www.semanticscholar.org/paper/The-VoltDB-Main-Memory-DBMS-Stonebraker-Weisberg/e857a9909670b52184da9877efa207fbe2f99bcf

Matei Zaharia, Mosharaf Chowdhury, et al. “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”. Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 15-28. https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf

Thomas N. Herzog , Fritz J. Scheuren et al. “Data Quality and Record Linkage Techniques”. Springer Nature, 15 May 2007. https://doi.org/10.1007/0-387-69505-2

Rohan Baxter, et al. “A Comparison of Fast Blocking Methods for Record Linkage”. The Australian National University. https://users.cecs.anu.edu.au/~Peter.Christen/publications/kdd03-6pages.pdf

KPMG, “Customer experience in the new reality”. Global Customer Experience Excellence research 2020: The COVID-19 special edition. 2020. https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2020/07/customer-experience-in-the-new-reality.pdf

AnHai Doan, Alon Halevy, et al. “Principles of Data Integration”. Morgan Kaufmann, 2012. https://doi.org/10.1016/C2011-0-06130-6

Downloads

Published

2025-07-23

How to Cite

Motamarri, V. (2025). Machine Learning-Powered Entity Resolution: A Scalable Approach for Real-Time Global Customer Matching. International Journal of Computing and Engineering, 7(13), 23–41. https://doi.org/10.47941/ijce.2995

Issue

Section

Articles