Tag: Reinforcement Learning
All the articles with the tag "Reinforcement Learning".
-
Reinforcement Learning for LLM Reasoning Under Memory Constraints
本文提出了S-GRPO和T-SPMO两种内存高效、无批评者的强化学习方法,结合LoRA微调,在有限硬件资源下显著提升了大型语言模型在数学推理任务上的性能,其中T-SPMO在需要细粒度信用分配的任务上表现尤为突出。
-
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
本文提出了一种多阶段训练方案,包括大规模蒸馏、滚动偏好优化和可验证奖励的强化学习,显著提升了小型语言模型在数学推理任务中的性能,使3.8B参数的Phi-4-Mini-Reasoning模型超过了近两倍参数的开源基线模型。
-
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.
-
Phi-4-reasoning Technical Report
本文通过数据导向的监督微调和强化学习,开发了小型LLM Phi-4-reasoning 和 Phi-4-reasoning-plus,提升了其在复杂推理任务上的性能,与大型模型竞争。