Multi-Level Safety Filtering Method for LLM Generation Security
DOI:
https://doi.org/10.63313/JCSFT.9025Keywords:
External Knowledge Base, Information Forgetting, Knowledge Poisoning At-tacks, Large Language ModelsAbstract
Large Language Models (LLMs) exhibit limitations due to their static para-metric knowledge, leading to potential inaccuracies and an inability to incor-porate updated information. Retrieval-Augmented Generation (RAG) mitigates this by integrating external knowledge bases. However, existing RAG paradigms face challenges such as information forgetting in long conversational contexts and vulnerability to knowledge poisoning attacks, which compromise the accu-racy and security of generated content. This research aims to enhance the capa-bilities of LLMs augmented by external knowledge by addressing these two crit-ical issues. To tackle information forgetting during conversational query rewriting, we propose KwRewriter, a keyword-integrated query rewriting framework. It em-ploys a document keyword extractor to enrich document representations and a query rewriter that fuses generated keywords with the rewritten query, empha-sizing core user intent. For security concerns, we introduce MLSF, a multi-level safety filtering method. MLSF implements a three-stage defense: vector-level coarse screening via K-means clustering, fine screening combining keyword consensus and BERTScore semantic matching, and final fact verification using the LLM's internal knowledge. Experimental results on open-domain QA datasets demonstrate KwRewrit-er's effectiveness, significantly improving retrieval metrics like MRR and Re-call@k over strong baselines, proving its ability to alleviate information loss. MLSF was validated on datasets subjected to knowledge poisoning attacks, where it substantially increased answer accuracy and reduced the attack success rate, especially under high poisoning ratios, confirming its robust defensive capability. In conclusion, this work successfully enhances RAG systems by improving both the accuracy of retrieval through context-aware query rewriting and the security of generation via hierarchical document filtering. The proposed meth-ods were integrated into a practical medical retrieval QA system, underscoring their applicability and value in providing reliable, knowledge-grounded re-sponses. Future work will focus on optimizing keyword integration, improving method generalizability, and exploring defenses against a broader spectrum of RAG security threats.
References
[1] Malik, A.S., Boyko, O., Atkar, N. and Young, W.F. (2001) A Comparative Study of MR Imag-ing Profile of Titanium Pedicle Screws. Acta Radiologica, 42, 291-293.
http://dx.doi.org/10.1080/028418501127346846
[2] Hu, T. and Desai, J.P. (2004) Soft-Tissue Material Properties under Large Deformation: Strain Rate Effect. Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, 1-5 September 2004, 2758-2761.
[3] Ortega, R., Loria, A. and Kelly, R. (1995) A Semiglobally Stable Output Feedback PI2D Reg-ulator for Robot Manipulators. IEEE Transactions on Automatic Control, 40, 1432-1436. http://dx.doi.org/10.1109/9.402235
[4] Wit, E. and McClure, J. (2004) Statistics for Microarrays: Design, Analysis, and Inference. 5th Edition, John Wiley & Sons Ltd., Chichester.
[5] Prasad, A.S. (1982) Clinical and Biochemical Spectrum of Zinc Deficiency in Human Sub-jects. In: Prasad, A.S., Ed., Clinical, Biochemical and Nutritional Aspects of Trace Elements, Alan R. Liss, Inc., New York, 5-15.
[6] Giambastiani, B.M.S. (2007) Evoluzione Idrologica ed Idrogeologica Della Pineta di san Vi-tale (Ravenna). Ph.D. Thesis, Bologna University, Bologna.
[7] Wu, J.K. (1994) Two Problems of Computer Mechanics Program System. Proceedings of Finite Element Analysis and CAD, Peking University Press, Beijing, 9-15.
[8] Honeycutt, L. (1998) Communication and Design Course.
http://dcr.rpi.edu/commdesign/class1.html
[9] Wright and Wright, W. (1906) Flying-Machine. US Patent No. 821393.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 by author(s) and Erytis Publishing Limited.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.













