Tag: Reinforcement Learning

All the articles with the tag "Reinforcement Learning".

SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation

Published: 5 May, 2025 at 11:15 PM

64.11 🤔

本文提出了 SmallPlan 框架，通过结合 LLM 指导的蒸馏、模拟环境反馈的 SFT 和 RL，训练轻量级的小型语言模型 (SLM) 进行高效的机器人高层路径规划，使其在资源受限的边缘设备上实现接近大型模型 (LLM) 的性能。
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Published: 4 May, 2025 at 04:27 PM

60.29 🤔

本文提出EPO方法，通过强化学习优化一个专门的战略推理模型，辅助任意LLM代理在动态环境中实现长期目标对齐，提升战略推理能力。
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning

Published: 4 May, 2025 at 04:27 PM

58.67 🤔

本文提出Reason2Attack方法，通过基于Frame Semantics的CoT示例合成和带攻击过程奖励的强化学习，增强LLM的推理能力，以高效生成对抗性提示实现对T2I模型的越狱攻击。
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Published: 4 May, 2025 at 04:32 PM

56.98 🤔

本文提出DeepSeek-Prover-V2，通过子目标分解和强化学习统一非正式和正式数学推理，显著提升了神经定理证明的性能，在多个基准上达到最先进水平。
Pushing the boundary on Natural Language Inference

Published: 4 May, 2025 at 04:30 PM

56.51 🤔

本文提出使用Group Relative Policy Optimization结合Chain-of-Thought学习的方法提升自然语言推理任务的性能，无需标注推理路径，通过参数高效微调在对抗性基准上实现最先进结果。

Tag: Reinforcement Learning

SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Pushing the boundary on Natural Language Inference