Tag: Reasoning
All the articles with the tag "Reasoning".
-
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
R1-Searcher++ 通过两阶段训练策略(SFT 和 RL),结合奖励机制和记忆模块,使大型语言模型自适应地平衡内部知识与外部检索,在多跳问答任务中显著提升准确性和检索效率。
-
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
This paper introduces Temperature Scaling (TS) and Trace Length Control for Dynamic Reasoning (TLDR) to enhance token efficiency in small language models, achieving up to 50% reduction in response length with minimal accuracy loss across multiple reasoning benchmarks.
-
Skywork Open Reasoner 1 Technical Report
Skywork-OR1通过提出MAGIC框架,利用多阶段训练和自适应熵控制的强化学习方法,显著提升了长链式推理模型在数学和编码任务上的性能,并在AIME24和AIME25基准上超越了DeepSeek-R1和Qwen3-32B。
-
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
本文提出上下文牵引(Contextual Entrainment)现象,揭示语言模型对提示中出现token的机制性偏好,并通过可微分掩码方法识别牵引头(entrainment heads),为理解和缓解分心问题提供了新视角。
-
Who Taught You That? Tracing Teachers in Model Distillation
本文提出了一种基于句法模式(PoS 模板)的方法,通过学生模型输出的高阶语言特征识别其教师模型,并在多个任务和数据集上验证了其优于传统相似度和困惑度方法的性能,但准确率仍有待提升。