Authors: Sanchayan Ghosh
Abstract: Large Language Models (LLMs) are increasingly being used in today’s real world artificially intelligent applica- tions, but they are still vulnerable to prompt leakage attacks which results in intellectual system prompt extraction. PLeak is such an advanced attack framework, where attackers try to obtain the system prompt through the model responses by carefully crafting adversarial queries iteratively. Currently available defense strategies, such as alignment based filtering, subspace constraining and casual isolation, are only partially effective against adaptive attacks, replay attacks and semantically obfuscated attacks. In this paper, we propose TotalShield, a comprehensive multi layered defnse system that aims that aims to provide strong immunity to prompt leakage attacks without solely depending on fine tuning the models. TotalShield is a multi layered defense system that consistes of seven synergistic protection layers: Prompt sanitization, Prompt fingerprinting, Neuro-guard transformation, concept masking, Leakage scoring, Adversarial behavior detection and Post generation rewriting. Unlike traditional single layer defense frameworks the proposed system combines lexical filtering, behavioral heuristics, semantic inspection, and response abstraction within a unified pipeline. To test robustness, we conduct iterative PLeak attacks and measure system performance using four quantitative metrics: Exact Match (EM), Substring Match (SM), Edit Distance (ED), and Semantic Similarity (SS). Our experimental result show that TotalShield preserves zero exact and partial leakage while maximizing semantic distance from protected system content. TotalShield is a deployable security solution for protecting system prompts in LLM services. By integrating deterministic security with semantic risk analysis, TotalShield pushes the frontiers of prompt leakage defense towards fully hardened systems.
International Journal of Science, Engineering and Technology