Article contents
Prescriptive Analytics on Anonymized Patient Data Using Regression and Distributed Computing
Abstract
The scale of digital healthcare has resulted in an unprecedented increase in patient data generated from clinical records, monitoring devices, and expansive health information systems. Predictive analytics has become a highly effective method for converting these types of data into actionable data that can be used to foster early diagnosis, predict outcomes, and provide preventive care to patients. Nonetheless, patient information is sensitive, and this issue poses substantial privacy and security threats, especially when data are processed within distributed and multi-institutional settings. This study explored the use of predictive analytics as regression on anonymized patient data through a distributed computing architecture. Using machine learning workflow solutions based on Apache Spark, the proposed solution can provide scalable data processing and effective model training with a low risk of privacy loss. Linear and regularized regressions were used to determine the predictive performance under different privacy conditions. It also explores the trade-off between the predictive utility and privacy of data in distributed healthcare analysis. These findings show that distributed regression models can achieve predictive accuracy with easily obtainable levels of reliability and privacy-sensitive data analysis, which are suitable for large-scale healthcare decision-support systems.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
4 (1)
Pages
107-111
Published
Copyright
Copyright (c) 2022 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment