Tag: Efficiency
All the articles with the tag "Efficiency".
-
Temporal Sampling for Forgotten Reasoning in LLMs
本文揭示了大型语言模型微调中的'Temporal Forgetting'现象,并提出'Temporal Sampling'方法,通过从多个训练检查点采样答案显著提升推理性能(Pass@k提升4-19个百分点),并通过LoRA适配降低存储成本。
-
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective
Sentinel提出了一种轻量化的句子级别上下文压缩框架,通过探测0.5B代理模型的注意力信号实现高达5倍压缩率,并在LongBench基准上匹配7B规模系统的QA性能。
-
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.
-
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Token Recycling 提出了一种无训练的推测解码方法,通过回收候选词并利用邻接矩阵构建草稿树,实现大型语言模型推理约 2 倍加速,相较于其他无训练方法提升超 30%。
-
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
本文提出Mixup Model Merge (M³) 方法,通过在参数空间中随机线性插值并利用Beta分布采样贡献比例,显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。