Authors: Sheetal Laroiya, Purnendu kumar Ghosh, Ketan, Subhanshu Raj
Abstract: Contemporary Large Language Models (LLMs) deployed within Retrieval-Augmented Generation (RAG) pipelines suffer from three distinct vulnerabilities: epistemological opacity (hallucinations), non-consensual data exploitation, and a lack of granular provenance. This paper proposes a tripartite architecture to resolve these deficits. First, we introduce an Indexer and Provenance Layer utilizing Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), and on-chain mapping to establish immutable audit trails for retrieved context. Second, we present a Privacy-Preserving Compute-to-Data paradigm leveraging tokenized access control to facilitate economic incentivization without exposing raw data to consumers. Finally, we formalize a Verifiable RAG Pipeline equipped with a multi-strategy Verifier Agent to autonomously audit LLM generation against cryptographically anchored evidence.
International Journal of Science, Engineering and Technology