Tag: Reinforcement Learning
All the articles with the tag "Reinforcement Learning".
-
Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework
本文提出了LMGT框架,通过利用大型语言模型的先验知识对强化学习的奖励进行动态调整,有效平衡了探索与利用,显著提高了样本效率并降低了训练成本,并在多种环境、算法以及机器人和推荐系统等复杂场景中验证了其有效性。
-
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
ARTIST, a novel framework unifying agentic reasoning, reinforcement learning, and tool integration, enables LLMs to autonomously orchestrate external tools within multi-turn reasoning, achieving up to 22% accuracy gains on complex math tasks and significant improvements in multi-turn function calling over baselines.
-
From System 1 to System 2: A Survey of Reasoning Large Language Models
本文综述了从基础LLMs向推理LLMs的演进,通过整合System 2技术提升AI的逐步推理能力,并在基准测试中展示了显著性能改进。
-
ComPO: Preference Alignment via Comparison Oracles
This paper introduces ComPO, a novel preference alignment method for LLMs using comparison oracles to effectively utilize noisy preference pairs, demonstrating reduced verbosity and likelihood displacement across multiple models and benchmarks.
-
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
本文提出了一种缓存高效的后验采样框架,通过元学习优化的缓存机制重用LLM先验,显著降低强化学习中的计算成本(查询减少3.8-4.7倍,延迟降低4.0-12.0倍),同时在文本和连续控制任务中保持96-98%的性能。