Research on Optimization of Large Model Code Generation Based on Improved RAG in Embedded Environments

Xin Xia; Jing Cheng

doi:10.63313/JCSFT.2002

Authors

Xin Xia Xi’an Technological University of Computer Science and Engineering, Xi’an, 710021, China Author
Jing Cheng Xi’an Technological University of Computer Science and Engineering, Xi’an, 710021, China Author

DOI:

https://doi.org/10.63313/JCSFT.2002

Keywords:

Embedded Environments, Retrieval-Augmented Generation, Large Models, Code Generation, Resource Optimization

Abstract

In embedded environments, large model code generation faces a series of issues such as high response latency, memory overflow, and poor adaptability of generated code due to limited computing power and tight memory resources. To address these problems, this paper proposes an improved RAG architecture (EmbedCode-RAG) based on fine-grained compression and dual-dimensional retrieval. Firstly, the architecture utilizes Abstract Syntax Tree (AST)-guided code snippet segmentation and lightweight block embedding compression technology to reduce the volume of reference code by 16-32 times. Subsequently, a "task-situation" dual-dimensional retrieval mechanism is designed to integrate code function similarity and embedded hardware constraint features. Finally, a dynamic knowledge base update module is incorporated to achieve progressive accumulation of generation experience. We conducted experiments on two hardware platforms: ARM Cortex-M7 and NVIDIA Jetson Nano. The dataset used is a self-constructed embedded code corpus, including MCU drivers, IoT protocols, and other related tasks. Experimental results show that compared with the traditional RAG method, EmbedCode-RAG achieves an 8.3% improvement in the CodeBLEU score of generated code [20], reduces the Time to First Token (TTFT) by 82.6%, and decreases memory usage by 79.2%. Furthermore, the proposed method operates without crashes on low-power hardware. This research provides a new solution for realizing efficient and reliable code generation in embedded scenarios.

References

[1] Yao, Shunyu et al. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” ArXiv abs/2305.10601 (2023): n. pag.

[2] CHEN X, LIU Y, WANG J, et al. Resource-efficient deployment of large language models for embedded code generation[J]. IEEE Transactions on Embedded Computing Systems, 2024, 23(2): 1124-1136.

[3] LEWIS P, PAGILUCA G, PEDERSEN J, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks[C]//Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020: 6410-6420.

[4] NVIDIA Corporation. TensorRT quantization guide for deep learning inference[EB/OL]. https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html, 2023.

[5] LAMB A, GAO Y, MARTINEZ C, et al. TinyLLaMA: An efficient small language model for edge devices[J]. arXiv preprint arXiv:2310.06764, 2023.

[6] XU Z, YANG J, LIU Z, et al. REFRAG: Efficient retrieval-augmented generation with block compression[C]//Proceedings of the 61st Annual Meeting of the Association for Computa-tional Linguistics. 2023: 7890-7902.

[7] ZHANG S, WANG Y, CHEN W, et al. P-RAG: Progressive retrieval-augmented generation for embodied task planning[J]. IEEE Transactions on Robotics, 2024, 40(1): 568-582.

[8] WANG H, LIU C, ZHANG J, et al. CodeRAG: Retrieval-augmented code generation for soft-ware development[J]. Journal of Systems and Software, 2023, 201: 111689.

[9] GUO D, YU A, PARULKAR S, et al. Quantization-aware training for low-precision code gen-eration models[C]//Proceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 2023: 245-268.

[10] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter[J]. arXiv preprint arXiv:1910.01108, 2019.

[11] DAI Z, LI M, XU Y, et al. StreamingLLM: Efficient streaming language models with attention sinks[J]. arXiv preprint arXiv:2309.17453, 2023.

[12] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J]. 2018.

[13] CHATTOPADHYAY A, BASU S, GHOSH S, et al. LangChain: Building applications with large language models[EB/OL]. https://langchain.com, 2022.

[14] NGUYEN T, DOAN T, PHAM H, et al. RAG-Fusion: Enhancing retrieval-augmented genera-tion with multi-query fusion[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023: 10243-10255.

[15] CHEN X, ZHAO Y, LIU H, et al. CodeRetriever: An efficient code retrieval system for open-source repositories[J]. IEEE Transactions on Software Engineering, 2022, 48(12): 4689-4703.

[16] META AI. CodeLlama-RAG: Retrieval-augmented generation for code[EB/OL]. https://ai.meta.com/research/publications/codellama-rag/, 2023.

[17] ZHANG X, SUN Y, LI J, et al. CodeBERT-mini: A lightweight model for code understanding and generation[J]. Journal of Artificial Intelligence Research, 2024, 79: 345-378.

[18] JOHNSON J, DONG W, SOCHER R. FAISS: A library for efficient similarity search and clus-tering of dense vectors[J]. arXiv preprint arXiv:1702.08734, 2017.

[19] LIU F, WANG Z, CHEN Y, et al. Hardware-aware fine-tuning for embedded code genera-tion[C]//Proceedings of the 2024 Design, Automation & Test in Europe Conference & Ex-hibition. 2024: 1234-1239.

[20] SONG X, GLEMBEK O, KHASHABI D, et al. CodeBLEU: A method for automatic evaluation of code generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1841-1850.