Research Article

Prescriptive Analytics on Anonymized Patient Data Using Regression and Distributed Computing

Authors

  • JAGADEESWAR ALAMPALLY Software Development Manager, USA

Abstract

The scale of digital healthcare has resulted in an unprecedented increase in patient data generated from clinical records, monitoring devices, and expansive health information systems. Predictive analytics has become a highly effective method for converting these types of data into actionable data that can be used to foster early diagnosis, predict outcomes, and provide preventive care to patients. Nonetheless, patient information is sensitive, and this issue poses substantial privacy and security threats, especially when data are processed within distributed and multi-institutional settings. This study explored the use of predictive analytics as regression on anonymized patient data through a distributed computing architecture. Using machine learning workflow solutions based on Apache Spark, the proposed solution can provide scalable data processing and effective model training with a low risk of privacy loss. Linear and regularized regressions were used to determine the predictive performance under different privacy conditions. It also explores the trade-off between the predictive utility and privacy of data in distributed healthcare analysis. These findings show that distributed regression models can achieve predictive accuracy with easily obtainable levels of reliability and privacy-sensitive data analysis, which are suitable for large-scale healthcare decision-support systems.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

4 (1)

Pages

107-111

Published

2022-03-25

How to Cite

JAGADEESWAR ALAMPALLY. (2022). Prescriptive Analytics on Anonymized Patient Data Using Regression and Distributed Computing. Journal of Computer Science and Technology Studies, 4(1), 107-111. https://doi.org/10.32996/jcsts.2022.4.1.13

Downloads

Views

0

Downloads

0

Keywords:

Predictive analytics; anonymized data; regression; distributed computing