Tag: Reinforcement Learning

All the articles with the tag "Reinforcement Learning".

Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework

Published: 5 May, 2025 at 11:16 PM

76.51 🤔

本文提出了LMGT框架，通过利用大型语言模型的先验知识对强化学习的奖励进行动态调整，有效平衡了探索与利用，显著提高了样本效率并降低了训练成本，并在多种环境、算法以及机器人和推荐系统等复杂场景中验证了其有效性。
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Published: 13 May, 2025 at 11:12 AM

76.49 🤔

ARTIST, a novel framework unifying agentic reasoning, reinforcement learning, and tool integration, enables LLMs to autonomously orchestrate external tools within multi-turn reasoning, achieving up to 22% accuracy gains on complex math tasks and significant improvements in multi-turn function calling over baselines.
From System 1 to System 2: A Survey of Reasoning Large Language Models

Published: 4 May, 2025 at 04:26 PM

75.04 🤔

本文综述了从基础LLMs向推理LLMs的演进，通过整合System 2技术提升AI的逐步推理能力，并在基准测试中展示了显著性能改进。
ComPO: Preference Alignment via Comparison Oracles

Published: 13 May, 2025 at 11:09 AM

73.73 🤔

This paper introduces ComPO, a novel preference alignment method for LLMs using comparison oracles to effectively utilize noisy preference pairs, demonstrating reduced verbosity and likelihood displacement across multiple models and benchmarks.
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains

Published: 18 May, 2025 at 11:16 AM

73.39 🤔

本文提出了一种缓存高效的后验采样框架，通过元学习优化的缓存机制重用LLM先验，显著降低强化学习中的计算成本（查询减少3.8-4.7倍，延迟降低4.0-12.0倍），同时在文本和连续控制任务中保持96-98%的性能。

Tag: Reinforcement Learning

Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

From System 1 to System 2: A Survey of Reasoning Large Language Models

ComPO: Preference Alignment via Comparison Oracles

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains