An AWS-Driven Intelligent Framework For Scalable Data Deduplication And Storage Optimization

6 Apr

Authors: Professor ,Dr. Y. Jayababu, Chikkala Kedareswari Kaivalya, Darna Mahathi, Sai Sri Ram, Kalepu Dhanush Sai

Abstract: Cloud storage systems are experiencing rapid growth due to the increasing demand for scalable and cost-effective data .management solutions. However, redundant data storage leads to excessive storage consumption, increased bandwidth usage, and higher operational costs. This paper proposes a data deduplication framework using Amazon Web Services (AWS) to eliminate duplicate files in cloud environments. The system generates a unique hash value using the MD5 hashing algorithm whenever a file is uploaded to Amazon S3. AWS Lambda functions are used to compute and compare hash values stored in Amazon DynamoDB to identify duplicate files. If a duplicate file is detected, the system prevents redundant storage and maintains a reference to the original file, thereby optimizing storage utilization. Experimental evaluation demonstrates improved storage efficiency, reduced memory consumption, and stable server response time. The proposed approach enhances cloud storage performance while maintaining data integrity and security.