Tag: Efficiency

All the articles with the tag "Efficiency".

Temporal Sampling for Forgotten Reasoning in LLMs

Published: 28 May, 2025 at 11:20 AM

92.01 🤔

本文揭示了大型语言模型微调中的'Temporal Forgetting'现象，并提出'Temporal Sampling'方法，通过从多个训练检查点采样答案显著提升推理性能（Pass@k提升4-19个百分点），并通过LoRA适配降低存储成本。
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Published: 2 Jun, 2025 at 11:24 AM

91.96 🤔

Sentinel提出了一种轻量化的句子级别上下文压缩框架，通过探测0.5B代理模型的注意力信号实现高达5倍压缩率，并在LongBench基准上匹配7B规模系统的QA性能。
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Published: 17 May, 2025 at 11:02 AM

91.74 🤔

This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Published: 23 May, 2025 at 11:14 AM

91.73 🤔

Token Recycling 提出了一种无训练的推测解码方法，通过回收候选词并利用邻接矩阵构建草稿树，实现大型语言模型推理约 2 倍加速，相较于其他无训练方法提升超 30%。
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation

Published: 3 Jun, 2025 at 11:27 AM

91.67 🤔

本文提出Mixup Model Merge (M³) 方法，通过在参数空间中随机线性插值并利用Beta分布采样贡献比例，显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。

Tag: Efficiency

Temporal Sampling for Forgotten Reasoning in LLMs

Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation