检索增强生成中的隐私保护技术综述

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:检索增强生成;隐私保护;差分隐私;联邦学习;合成数据

中图分类号:TP309 文献标志码:A 文章编号:1001-3695(2026)04-003-0985-10

doi:10.19734/j.issn.1001-3695.2025.09.0308

Review of privacy protection technologies in retrieval-augmented generation

Liu Xiaoqian1†,Yuan Ming1,²,Qian Hanwei1,3,Gao Guangliang',Wang Qun1 (1.Dept.ofoefaCyuritygsicestuea;lfe jingUniesitdenfaeuUst

Abstract:RAG systems facesevererisksof sensitiveinformation leakage throughout theentireretrievaland generation processTosystematicallsortouttheirprivacythreatsandprotectiontechnologies,thisstudyfirstlyexplainedthecore principlesandappicationscenariosofRAGsystemsandclarified theirprivacydefinition.Itsystematicallyanalyzedrepresentative privacyatackmodelssuchasmembershipinference,implicitknowledgeextraction,andkowledgepoisoning.Then,itclassifiedexisting privacyprotetiontechnologiesintofourcategories:diferentialprivacytechnologiesbasedonnoiseinjection,decentralizedcolaborative trainingframeworksbasedonfederated learning,source-levelprivacysubstitutionschemebasedon syntheticdata,andencryptiontechnologiescoveringtheentiredatalifecycle.Bycomparingtheaplicablescenarios,advantagesandlimitationsofvarious technologies,thisstudyaimstoprovidestrategicreferencesforprivacyprotectionofRAGsystems. KeyWords:retrieval-augmented generation(RAG);privacyprotection;diffrential privacy(DP);federated learning(FL); synthetic data

0 引言

RAG作为一种融合外部知识与传统LLMs(大语言模型)能力的创新性框架,其核心价值在于打破模型参数内置的静态数据与现实世界中持续更新的动态信息之间的壁垒,在运行中将内部数据库中的相关信息与外部信息相结合,并最终完成内容的生成。(剩余29709字)

目录
monitor