Authors: Srinivasa Chakravarthy Seethala
Abstract: Autonomous microservice scaling has become a central challenge for large enterprise platforms as rising API workloads, unpredictable traffic bursts, and complex interservice dependencies exceed the capabilities of static threshold rules and heuristic based autoscaling strategies. This study examines how deep reinforcement learning can serve as an adaptive decision layer for continuous resource optimization, enabling platforms to maintain stable performance while minimizing operational overhead. The purpose of the research is to design and evaluate a deep reinforcement learning model that learns optimal scaling behaviors at both the microservice and API endpoint levels by observing latency patterns, queue depths, call graph interactions, and container performance indicators. A mixed methodological approach is used, combining quantitative experiments on simulated large scale workloads with qualitative assessments of model behavior, action stability, and interpretability patterns across varying API conditions. The findings demonstrate that agents trained with actor critic architectures significantly outperform rule based and predictive autoscalers in maintaining low tail latency, reducing aggressive scale outs, and stabilizing throughput during complex dependency shifts. The study introduces an architectural blueprint that integrates policy networks with real time telemetry streams and platform orchestration layers, offering a scalable path for intelligent operational autonomy within enterprise environments. The research contributes to academic discourse by extending reinforcement learning applications to fine grained API optimization rather than coarse infrastructure control, while providing industry practitioners with strategies to manage rising platform complexity. The conclusion highlights that deep reinforcement learning can serve as a foundation for future self regulating enterprise architectures where scaling, traffic shaping, and resource allocation operate in a cohesive intelligence loop without human intervention.
International Journal of Science, Engineering and Technology