EduRL-GPT: A Reinforcement Learning Optimized Generative AI Framework for Intelligent Teaching Content Generation and Personalized Feedback

Xueqi Tang; Sitong Liu

doi:10.63313/JCSFT.9072

Authors

Xueqi Tang Wuhan University, Wuahn, China Author
Sitong Liu University of Pennsylvania, Philadelphia, USA Author

DOI:

https://doi.org/10.63313/JCSFT.9072

Keywords:

Generative Artificial Intelligence, Intelligent Tutoring System, Personalized Learning, Reinforcement Learning, Large Language Models, Educational Technology

Abstract

The rapid advancement of generative artificial intelligence (Generative AI), particularly large language models such as GPT, Claude, and Gemini, has significantly reshaped the design of intelligent tutoring systems by enabling automatic generation of instructional content, adaptive assessments, and personalized learning feedback. However, most existing generative AI-based educational systems rely on static prompting strategies and lack mechanisms to continuously optimize feedback quality according to students’ evolving learning states, which limits their effectiveness in personalized education scenarios. To address this limitation, this paper proposes EduRL-GPT, a reinforcement learning optimized generative AI framework for intelligent teaching content generation and personalized feedback. EduRL-GPT integrates a state-aware generative teaching engine with a reinforcement learning-based feedback optimization module, allowing the system to dynamically generate explanations, quizzes, and learning suggestions tailored to individual students’ knowledge mastery, learning behaviors, and preferences. Specifically, a Proximal Policy Optimization (PPO) algorithm is employed to adaptively select pedagogically appropriate feedback strategies based on observed learning gains and engagement signals, forming a closed-loop intelligent tutoring process. The proposed framework is evaluated on an online learning dataset with real student interaction logs. Experimental results show that EduRL-GPT significantly outperforms strong baselines. Compared with an LLM-based one-shot tutoring system, EduRL-GPT achieves a 12.8% improvement in learning gain and approximately a 13.0% relative increase in student engagement, while also yielding higher feedback satisfaction and accuracy improvement. These results demonstrate that reinforcement learning-optimized generative AI can deliver more effective, stable, and personalized instructional feedback for intelligent tutoring and adaptive learning systems.

References

[1] Heilman M, Smith N A. Good question! statistical ranking for question generation[C]//Human language technologies: The 2010 annual conference of the North American Chapter of the Association for Computational Linguistics. 2010: 609-617.

[2] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.

[3] Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240): 1-113.

[4] Kasneci E, Seßler K, Küchemann S, et al. ChatGPT for good? On opportunities and challenges of large language models for education[J]. Learning and individual differences, 2023, 103: 102274.

[5] Corbett A T, Anderson J R. Knowledge tracing: Modeling the acquisition of procedural knowledge[J]. User modeling and user-adapted interaction, 1994, 4(4): 253-278.

[6] Piech C, Bassen J, Huang J, et al. Deep knowledge tracing[J]. Advances in neural information processing systems, 2015, 28.

[7] Ghosh A, Heffernan N, Lan A S. Context-aware attentive knowledge tracing[C]//Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020: 2330-2339.

[8] Clement B, Roy D, Oudeyer P Y, et al. Multi-armed bandits for intelligent tutoring systems[J]. arXiv preprint arXiv:1310.3174, 2013.

[9] Rafferty A N, Brunskill E, Griffiths T L, et al. Faster teaching via pomdp planning[J]. Cognitive science, 2016, 40(6): 1290-1332.

[10] Li X, Xu H, Zhang J, et al. Deep reinforcement learning for adaptive learning systems[J]. Journal of Educational and Behavioral Statistics, 2023, 48(2): 220-243.

[11] Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks[J]. Advances in neural information processing systems, 2020, 33: 9459-9474.

[12] Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in neural information processing systems, 2022, 35: 27730-27744.

[13] Wang, Tangtang, Kaijie Zhang, and Kuangcong Liu. "A Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System for Advertisement Retrieval and Personalization." arXiv preprint arXiv:2601.00833 (2025).

[14] Wei H, Wu Y, Li M. RAGN-IIoT: A Retrieval-Augmented NL2SQL Framework with Dynamic Sensor-Selection Guardrails for Industrial IoT Time-Series Data Warehouses[J]. Journal of Computer, Signal, and System Research, 2025, 2(7): 78-88.

[15] Li T, Li H, Zhou Y. E-commerce Sentiment Analysis Using Fine-tuned LLaMA3 Models: A QLoRA-based Approach[J]. Journal of Technology Innovation and Engineering, 2025, 1(4).

[16] Fan P, Li H, Hu M. Profit-Oriented Production and Pricing Optimization for Manufacturing Enterprises Using Proximal Policy Optimization[J]. Economics and Management Innovation, 2026, 3(2): 8-17.

[17] Liang, Zucheng, et al. "Research on multi-hop inference optimization of llm based on mquake framework." arXiv preprint arXiv:2509.04770 (2025).