Posts
All the articles I've posted.
-
Don't be lazy: CompleteP enables compute-efficient deep transformers
This paper introduces CompleteP, a parameterization for transformers with α = 1, which ensures depth-wise hyperparameter transfer and complete feature learning, achieving 12-34% compute efficiency improvements and enabling a wider range of compute-optimal width-to-depth ratios.
-
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models
本文提出AdaptMI和AdaptMI+自适应方法,通过基于奖励模型检测问题难度并针对困难问题选择技能-based in-context示例,提高小语言模型在数学推理任务中的性能,同时避免认知过载。
-
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
This paper demonstrates through meta-analysis and experiments that Chain-of-Thought (CoT) prompting significantly enhances large language model performance on math and symbolic reasoning tasks, but offers limited benefits for non-symbolic tasks and underperforms compared to tool-augmented approaches.
-
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
本文提出了推测思维链(SCoT)框架,通过轻量级草稿模型并行生成多个思维链草稿,并由微调后的目标大模型选择最佳草稿或决定重新思考,从而在保持接近大模型准确率的同时,显著降低了大型语言模型的推理延迟。
-
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
本文提出 StreamRL 框架,通过分离式流生成架构优化 RL 训练,解决了流水线和偏斜气泡问题,提高了 LLMs RL 训练的吞吐量和成本效率。