Using Natural Language Processing (NLP) to Identify Fraudulent Healthcare Claims

Authors

  • Mani Joga Rao Cheekaramelli Health Insurance Company, (USA)

DOI:

https://doi.org/10.47941/ijce.2738

Keywords:

Artificial Intelligence, Healthcare Fraud Detection, Unstructured Data Analysis, Natural Language Processing.

Abstract

Purpose: This white paper describes the need to enhance fraud detection within healthcare using the methods of Natural Language Processing (NLP) in unstructured text: physician notes, patient records, and claim descriptions. To overcome the limitations of traditional rule-based platforms in handling healthcare’s unstructured data complexity and scale is the objective.

Methodology: The proposed approach combines with a well-established pre-trained NLP models (BioBERT and ClinicalBERT) with known methods, such as named entity recognition, anomaly detection, and predictive modeling. A phased approach, as part of the implementation strategy, will be used to implement NLP models for clinical IT environments, from data ingestion and transformation through model deployment and live fraud surveillance.

Findings: Based on the studies’ results, NLP systems increase fraud detection accuracy by 30 percent, reduce false positives by 20 percent, and allow claims processing under a second. While the white paper’s innovative offering begins with a proposal for a hybrid solution, which combines NLP-driven text analysis with existing rule-based systems, this combination delivers a stronger and more flexible means of fraud detection. The predictive nature of NLP enables healthcare organizations to identify potential fraud risks for providers before the issues grow worse.

Unique Contribution to Theory, Practice and Policy: The paper’s experts call upon IT personnel to lead adopting NLP systems, refresh models to meet new fraud threats, and explore collaboration with federated learning and blockchain to enhance protections and compliance standards. Upon implementing these recommendations, healthcare organization will be able to more effectively deal with fraudulent activities and optimize their workflows more efficiently.

Downloads

Download data is not yet available.

Author Biography

Mani Joga Rao Cheekaramelli, Health Insurance Company, (USA)

Independent Researcher, Lead Engineer,

References

Al-Hanawi, M. K., Alqahtani, F. S., Alharbi, T. K., Alshahrani, S. M., Alsaif, B., Aljuaid, M., & Alboqami, A. (2021). The economic burden of healthcare fraud in Saudi Arabia: A cross-sectional study. Risk Management and Healthcare Policy, 14, 4673–4682. https://doi.org/10.2147/RMHP.S333614

Alkhodair, S. A., Altwaijri, N., & Albarrak, A. I. (2023). Identifying preventable emergency admissions in hospitals using machine learning. In Telehealth ecosystems in practice (pp. 95–96). IOS Press. https://doi.org/10.3233/SHTI230741

Amazon Web Services. (2022). AWS. https://aws.amazon.com

Baader, G., & Krcmar, H. (2018). Cybersecurity awareness in accounting research: A literature review. International Journal of Accounting Information Systems, 31, 1–16.

Bartholomew, D. C., Nwaigwe, C. C., Orumie, U. C., & Nwafor, G. O. (2024). Intervention analysis of COVID-19 vaccination in Nigeria: The naive solution versus interrupted time series. Annals of Data Science, 11(5), 1609–1634. https://doi.org/10.1007/s40745-023-00492-2

Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2023). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 6, 123–144. https://doi.org/10.1146/annurev-biodatasci-110122-094135

Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6), 493–497. https://doi.org/10.1038/s41551-021-00751-8

Dai, T., Zhao, J., Li, D., Tian, S., Zhao, X., & Pan, S. (2023). Heterogeneous deep graph convolutional network with citation relational BERT for COVID-19 inline citation recommendation. Expert Systems with Applications, 213, Article 118841. https://doi.org/10.1016/j.eswa.2022.118841

He, Y., Aliyu, A., Evans, M., & Luo, C. (2021). Health care cybersecurity challenges and solutions under the climate of COVID-19: Scoping review. Journal of Medical Internet Research, 23(4), Article e21747. https://doi.org/10.2196/21747

Herland, M., Bauder, R. A., & Khoshgoftaar, T. M. (2020). Approaches for identifying U.S. Medicare fraud in medical claims data. Health Information Science and Systems, 8(1), 1–13. https://doi.org/10.1007/s13755-020-00114-4

Himmelstein, D. U., & Woolhandler, S. (2020). The U.S. health care system on the eve of the Covid-19 epidemic: A review of recent trends. Health Affairs, 39(10), 1710–1718. https://doi.org/10.1377/hlthaff.2020.00815

Holzinger, A., Malle, B., Saranti, A., & Pfeifer, B. (2021). Towards multi-modal causability with graph neural networks enabling information fusion for explainable AI. Information Fusion, 71, 28–37. https://doi.org/10.1016/j.inffus.2021.01.008

Hoofnagle, C. J., van der Sloot, B., & Borgesius, F. Z. (2019). The European Union General Data Protection Regulation: What it is and what it means. Information & Communications Technology Law, 28(1), 65–98. https://doi.org/10.1080/13600834.2019.1573501

Johnson, J. M., & Khoshgoftaar, T. M. (2020a). Data-centric AI for healthcare fraud detection. Health Information Science and Systems, 8(1), 1–13. https://doi.org/10.1007/s13755-020-00114-4

Johnson, J. M., & Khoshgoftaar, T. M. (2020b). Medicare fraud detection using machine learning with gradient boosting. Journal of Big Data, 7(1), 1–25. https://doi.org/10.1186/s40537-020-00377-8

Kolambe, S., & Kaur, P. (2024). Exploring advanced techniques in natural language processing and machine learning for in-depth analysis of insurance claims. In Smart computing paradigms: Artificial intelligence and network applications (pp. 47–56). Springer. https://doi.org/10.1007/978-981-97-7880-5_5

Kumaraswamy, N., Markey, M. K., Ekin, T., Barner, J. C., & Rascati, K. (2022). Healthcare fraud data mining methods: A look back and look ahead. Perspectives in Health Information Management, 19(1), 1i. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790905/

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682

Liao, Q., Fielding, R., Cheung, Y. T. D., Lian, J., Yuan, J., & Lam, W. W. T. (2020). Effectiveness and parental acceptability of social networking interventions for promoting seasonal influenza vaccination among young children: Randomized controlled trial. Journal of Medical Internet Research, 22(2), Article e16427. https://doi.org/10.2196/16427

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607

National Health Care Anti-Fraud Association. (2023). The challenge of healthcare fraud. https://www.nhcaa.org/resources/health-care-fraud-statistics/

Nicora, G., Moretti, F., Sauta, E., Della Porta, M., Malcovati, L., Cazzola, M., & Bellazzi, R. (2020). A continuous-time Markov model approach for modeling myelodysplastic syndromes progression from cross-sectional data. Journal of Biomedical Informatics, 104, Article 103398. https://doi.org/10.1016/j.jbi.2020.103398

Noor, A., Pattanaik, P., Khan, M. Z., Alromema, W., & Noor, T. H. (2023). Deep feature detection approach for COVID-19 classification based on X-ray images. International Journal of Advanced Computer Science and Applications, 14(5), 532–539. https://doi.org/10.14569/IJACSA.2023.0140560

PYMNTS.com. (2020). Deep dive: How AI and ML improve fraud detection rates and reduce false positives. https://www.pymnts.com

Sadiq, S., Yan, Y., Taylor, A., Shyu, C.-R., & Chen, S.-C. (2021). AAFA: Associative affinity factor analysis for bot detection and stance classification in Twitter. Information Processing & Management, 58(3), Article 102511. https://doi.org/10.1016/j.ipm.2020.102511

Saripalle, R. K. (2020). Leveraging FHIR to integrate clinical data across heterogeneous health systems. Health Informatics Journal, 26(4), 2871–2885. https://doi.org/10.1177/1460458220944197

Schwartz, P. M., & Solove, D. J. (2014). Reconciling personal information in the United States and European Union. California Law Review, 102(4), 877–916. https://doi.org/10.15779/Z38W66984C

Shi, Y., Nie, X., Zhu, Z., Xie, L., Wang, W., & Miao, J. (2022). Boundary evaluation of the maximum coupling obtained in EM illumination test with different polarization direction. Electronics, 11(15), Article 2345. https://doi.org/10.3390/electronics11152345

Shorten, C., Khoshgoftaar, T. M., & Furht, B. (2021). Deep learning applications for COVID-19. Journal of Big Data, 8(1), Article 18. https://doi.org/10.1186/s40537-020-00392-9

Slomski, A. (2020). Palliative care benefits patients with Parkinson disease. JAMA, 323(16), 1543. https://doi.org/10.1001/jama.2020.2949

Smith, T., Tadesse, A. F., & Vincent, N. E. (2021). The impact of CIO characteristics on data breaches. International Journal of Accounting Information Systems, 43, Article 100532. https://doi.org/10.1016/j.accinf.2021.100532

Tabaie, A., Sengupta, S., Pruitt, Z. M., & Fong, A. (2023). A machine learning approach with human-AI collaboration for automated classification of patient safety event reports: Algorithm development and validation study. BMJ Health & Care Informatics, 30(1), Article e100731. https://doi.org/10.1136/bmjhci-2022-100731

Thornton, D., Mueller, R. M., Paulus, D., & Schoutens, P. (2022). The economic impact of AI on healthcare fraud detection: A systematic review. Health Policy and Technology, 11(2), Article 100623. https://doi.org/10.1016/j.hlpt.2022.100623

Vindrola-Padros, C., Ledger, J., Barbosa, E. C., & Fulop, N. J. (2022). The implementation of improvement interventions for 'low performing' and 'high performing' organisations in health, education and local government: A phased literature review. International Journal of Health Policy and Management, 11(7), 874–882. https://doi.org/10.34172/ijhpm.2020.197

Zamzami, N., Koochemeshkian, P., & Bouguila, N. (2020). A distribution-based regression for real-time COVID-19 cases detection from chest X-ray and CT images. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 104–111). IEEE. https://doi.org/10.1109/IRI49571.2020.00024

Zhang, C., Xiao, X., & Wu, C. (2020). Medical fraud and abuse detection system based on machine learning. International Journal of Environmental Research and Public Health, 17(19), Article 7265. https://doi.org/10.3390/ijerph17197265

Zhang, R., Tian, D., Wang, H., Kang, X., Wang, G., & Xu, L. (2023). Risk assessment of compound dynamic disaster based on AHP-EWM. Applied Sciences, 13(18), Article 10137. https://doi.org/10.3390/app131810137

Downloads

Published

2025-05-20

How to Cite

Cheekaramelli, M. J. R. (2025). Using Natural Language Processing (NLP) to Identify Fraudulent Healthcare Claims. International Journal of Computing and Engineering, 7(3), 34–53. https://doi.org/10.47941/ijce.2738

Issue

Section

Articles