Tag: Reasoning

All the articles with the tag "Reasoning".

PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

Published: 2 Jun, 2025 at 11:32 AM

87.10 🤔

PASER提出了一种针对剪枝后大语言模型能力恢复的后训练数据选择方法，通过语义聚类、能力退化感知选择和负面效应缓解，在有限数据预算下显著提升恢复性能并降低计算成本。
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

Published: 21 May, 2025 at 11:23 AM

87.10 🤔

SoftCoT++ 通过在连续潜在空间中引入多样化初始令牌和对比学习实现测试时扩展，显著提升了大型语言模型在多个推理任务上的性能，并与传统离散空间扩展方法展现出协同效应。
Incentivizing Strong Reasoning from Weak Supervision

Published: 30 May, 2025 at 11:19 AM

87.07 🤔

本文提出弱到强推理（W2SR）范式，通过显著较弱教师模型生成的结构化链式思维轨迹对强学生模型进行监督微调，以低成本方式显著提升其推理能力，接近甚至超越昂贵的强化学习效果。
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Published: 28 May, 2025 at 11:22 AM

87.06 🤔

AdaReasoner通过强化学习框架自适应调整大型语言模型的推理配置（生成温度、推理步骤数和指令格式），在多样化任务上显著优于固定配置的基线方法，展现了快速收敛和分布外鲁棒性。
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities

Published: 23 May, 2025 at 11:10 AM

87.03 🤔

本文提出MoL框架，通过对领域语料使用CE损失和对通用语料使用KL散度损失的双重优化策略，显著提升大型语言模型的领域专长，同时有效保留通用能力，并在医学领域任务中取得优异表现。

Tag: Reasoning

PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

Incentivizing Strong Reasoning from Weak Supervision

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities