Authors: Sudhir Vishnubhatla
Abstract: The acceleration of digital banking during the first half of the 2010s placed unprecedented demands on enterprise data pipelines. Transaction volumes grew exponentially, regulatory oversight became stricter, and customer expectations for instant services reshaped operational priorities. Legacy batch-oriented Extract–Transform–Load (ETL) systems, once adequate for daily reconciliation and reporting, increasingly failed to meet the requirements of low latency, horizontal scalability, and embedded compliance. By 2016, the convergence of distributed open-source frameworks such as Apache Kafka, Spark Streaming, and Flink with early cloud-native services such as Amazon Kinesis, AWS Lambda, and Google Cloud Dataflow made it possible to design a new generation of resilient and modular pipelines. This article situates these developments in the context of banking, a domain that uniquely balances throughput efficiency with legal and regulatory obligations. By synthesizing case evidence and architectural advances prior to mid-2016, it proposes a reference architecture that unites ingestion, processing, orchestration, and compliance within a cloud-native design