Research Article

Reinforcement Learning-Driven Fault Recovery in Cloud-Native Data Integration Architectures

Authors

  • Annapurneswar Putrevu Independent Researcher, USA

Abstract

Modern data integration pipelines are encountering unprecedented challenges in handling schema drift, resource bottlenecks, and unexpected data imposter data that often lead to system failures and service interruptions. Traditional rule-based recovery options are ineffective in this dynamic cloud environment, as they are primarily manual and require so much time that downtime is significant. The paper proposes the first framework that utilizes reinforcement learning agents (RLAs) to enable data integration systems to have self-healing capabilities. The architecture integrates real-time anomaly detection and intelligent root cause analysis engines to configure RLA's to learn proper recovery strategies from past events against the behavior of previous pipelines. RLAs can alter resource allocations, reconfigure workflows, or take actions that include schema remapping or intelligent retries autonomously. Experiments in Kubernetes-based environments show significant improvements in pipeline reliability, recovery time, and service uptime. The paper provides evidence for moving toward adaptive, holistic, self-healing data engineering with less human involvement in favor of robust systems that can learn and act in a committed cloud ecosystem that enables both scalability and resilience.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (9)

Pages

508--515

Published

2025-09-12

How to Cite

Annapurneswar Putrevu. (2025). Reinforcement Learning-Driven Fault Recovery in Cloud-Native Data Integration Architectures. Journal of Computer Science and Technology Studies, 7(9), 508-515. https://doi.org/10.32996/jcsts.2025.7.9.58

Downloads

Views

18

Downloads

3

Keywords:

Reinforcement learning, self-healing systems, data integration pipelines, fault tolerance, cloud-native architectures