Cryptographically Verifiable Retrieval-Augmented Generation: A Tripartite Architecture For Decentralized Provenance, Compute-to-Data Privacy, And Automated Fact-Checking

17 Apr

Authors: Sheetal Laroiya, Purnendu kumar Ghosh, Ketan, Subhanshu Raj

Abstract: Contemporary Large Language Models (LLMs) deployed within Retrieval-Augmented Generation (RAG) pipelines suffer from three distinct vulnerabilities: epistemological opacity (hallucinations), non-consensual data exploitation, and a lack of granular provenance. This paper proposes a tripartite architecture to resolve these deficits. First, we introduce an Indexer and Provenance Layer utilizing Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), and on-chain mapping to establish immutable audit trails for retrieved context. Second, we present a Privacy-Preserving Compute-to-Data paradigm leveraging tokenized access control to facilitate economic incentivization without exposing raw data to consumers. Finally, we formalize a Verifiable RAG Pipeline equipped with a multi-strategy Verifier Agent to autonomously audit LLM generation against cryptographically anchored evidence.