Tag: Reasoning

All the articles with the tag "Reasoning".

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Published: 22 May, 2025 at 11:17 AM

93.37 🤔

本文提出了一种两阶段训练框架，通过领域无关的Knights & Knaves逻辑游戏预热激活通用推理能力，并结合少量目标领域数据的RLVR训练，在资源受限环境下显著提升大型语言模型的推理性能和跨领域泛化能力。
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer

Published: 4 Jun, 2025 at 11:59 AM

93.14 🤔

本文通过冻结Transformer组件并提出MixiT模型，揭示了自注意力机制在检索和语言建模中的输入依赖性必要性，以及MLP层在记忆中的主导作用，强调了架构异质性对任务解决的重要性。
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Published: 28 May, 2025 at 11:25 AM

93.11 🤔

本文提出Agent Distillation框架，通过将LLM代理的交互行为蒸馏到sLMs中，并结合first-thought prefix和self-consistent action generation方法，使小型模型在事实和数学推理任务上取得显著性能提升，接近甚至超越更大规模的CoT蒸馏模型。
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

Published: 23 May, 2025 at 11:16 AM

93.01 🤔

本文提出自推理语言模型（SRLM），通过少量推理催化数据引导模型自生成更长推理链并迭代自训练，在多个推理基准上实现平均 +2.5 个百分点的性能提升，展现了探索深度和创造性推理路径的潜力。
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Published: 22 May, 2025 at 11:16 AM

92.95 🤔

本文提出 LATENTSEEK 框架，通过在潜在空间中基于策略梯度的测试时实例级适应（TTIA），显著提升大型语言模型的推理能力，同时探索测试时扩展的新方向。

Tag: Reasoning

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space