RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation (2024), Jin Chao | AcademicGPT, tlooto

This work proposes RAGCache, a latency-optimized serving system tailored for RAG that leverages the retrieval pattern to organize and cache the intermediate states of retrieved knowledge in a knowledg (2024), ACM TRANSACTIONS ON COMPUTER SYSTEMS, Jin Chao | AcademicGPT, tlooto for Academic and Research