Machine Learning-Powered Entity Resolution: A Scalable Approach for Real-Time Global Customer Matching
DOI:
https://doi.org/10.47941/ijce.2995Keywords:
Entity Resolution, Machine Learning, Fuzzy Matching, Real-time Data Integration, Customer Identity ManagementAbstract
This article presents a comprehensive approach to entity resolution (ER) that addresses the fundamental challenge of accurately unifying customer identities across disparate global data sources in real-time environments. The article introduces a hybrid record linkage system that transcends the limitations of traditional rule-based approaches by combining deterministic blocking with advanced fuzzy matching algorithms and supervised machine learning techniques. The article leverages Apache Spark's distributed processing capabilities alongside VoltDB's in-memory database technology to achieve both the accuracy and performance required for enterprise-scale deployment. Our methodology incorporates TF-IDF vectorization, Jaro-Winkler distance metrics, and logistic regression ensembles to generate calibrated match likelihood scores that enable flexible decision thresholds for different business contexts. Beyond the technical implementation, the article presents a holistic framework addressing the operational challenges of deploying sophisticated matching systems in regulated environments, including data quality monitoring, stakeholder engagement, and governance models that balance algorithmic consistency with business flexibility. Performance optimizations significantly reduced processing times while maintaining high match quality, enabling both efficient batch reconciliation and real-time matching during customer interactions. The system's self-monitoring and continuous learning capabilities have created a platform that evolves with changing data patterns rather than degrading over time. This article serves as both a technical blueprint and a strategic guide for organizations seeking to implement scalable, explainable, and high-performance entity resolution systems in complex, global environments.
Downloads
References
Peter Christen. “Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection”. Springer Science & Business Media, 05 July 2012. https://doi.org/10.1007/978-3-642-31164-2
Ivan P. Fellegi, Alan Sunter. “A Theory for Record Linkage”. Journal of the American Statistical Association, 64(328), 1183-1210, 10 Apr 2012. https://doi.org/10.1080/01621459.1969.10501049
Peter Christen, Karl Goiser. “Quality and Complexity Measures for Data Linkage and Deduplication”. Quality Measures in Data Mining (pp. 127-151), 2007. Springer. https://doi.org/10.1007/978-3-540-44918-8_6
Qing Wang et al. “Semantic-Aware Blocking for Entity Resolution”. IEEE Transactions on Knowledge and Data Engineering, 28(1), 166-180. 14 August 2015. https://doi.org/10.1109/TKDE.2015.2468711
Lise Getoor, Ashwin Machanavajjhala. “Entity Resolution: Theory, Practice & Open Challenges”. Proceedings of the VLDB Endowment, 5(12), 2018-2019. 01 August 2012 https://doi.org/10.14778/2367502.2367564
M. Stonebraker, Ariel Weisberg. “The VoltDB Main Memory DBMS”. IEEE Data Engineering Bulletin, 36(2), 21-27, 2013. https://www.semanticscholar.org/paper/The-VoltDB-Main-Memory-DBMS-Stonebraker-Weisberg/e857a9909670b52184da9877efa207fbe2f99bcf
Matei Zaharia, Mosharaf Chowdhury, et al. “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”. Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 15-28. https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
Thomas N. Herzog , Fritz J. Scheuren et al. “Data Quality and Record Linkage Techniques”. Springer Nature, 15 May 2007. https://doi.org/10.1007/0-387-69505-2
Rohan Baxter, et al. “A Comparison of Fast Blocking Methods for Record Linkage”. The Australian National University. https://users.cecs.anu.edu.au/~Peter.Christen/publications/kdd03-6pages.pdf
KPMG, “Customer experience in the new reality”. Global Customer Experience Excellence research 2020: The COVID-19 special edition. 2020. https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2020/07/customer-experience-in-the-new-reality.pdf
AnHai Doan, Alon Halevy, et al. “Principles of Data Integration”. Morgan Kaufmann, 2012. https://doi.org/10.1016/C2011-0-06130-6
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Veerababu Motamarri

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.