Real-Time Data Processing Using Apache Kafka: Architecture, Implementation, And Performance Evaluation

4 Dec

Authors: Dr. C.K. Gomathy, MVR Nikhil, S Sriharsha

Abstract: Apache Kafka has emerged as the industry-standard platform for real-time data streaming and large-scale event processing. With increasing digitalization across IoT, finance, e-commerce, and healthcare, organizations require high-throughput, fault-tolerant systems capable of handling millions of data events per second. This research investigates the role of Apache Kafka as the backbone of modern streaming architectures and evaluates its performance within a big data analytics pipeline. A detailed literature review identifies advancements in Kafka Streams, Kafka Connect, tiered storage, and exactly-once processing. The proposed methodology integrates Kafka with Apache Spark Structured Streaming to build a scalable real-time anomaly detection system. Experiments executed on a clustered environment show substantial improvements in throughput, latency, and overall reliability. Results validate Kafka’s effectiveness as a distributed commit log enabling real-time analytics at scale. The paper concludes with future directions such as serverless Kafka, AI- driven topic optimization, and cloud-native streaming enhancements.

DOI: http://doi.org/10.5281/zenodo.17813965