Tag: Long Context

All the articles with the tag "Long Context".

LoLA: Low-Rank Linear Attention With Sparse Caching

Published: 1 Jun, 2025 at 11:40 AM

88.31 🤔

LoLA通过结合线性注意力、滑动窗口和稀疏缓存三种内存形式，在推理时有效缓解记忆冲突，显著提升线性注意力模型在长上下文关联回忆和语言建模任务上的性能，同时保持高效内存使用。
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Published: 21 May, 2025 at 11:24 AM

88.16 🤔

SoLoPO通过将长上下文偏好优化分解为短上下文优化和短到长奖励对齐，显著提升了大型语言模型在长上下文任务中的性能和训练效率，同时保持短上下文能力。
Does quantization affect models' performance on long-context tasks?

Published: 2 Jun, 2025 at 11:34 AM

87.84 🤔

本文系统评估了量化对大型语言模型在长上下文任务中的性能影响，发现8-bit量化基本保持准确率（下降约0.8%），而4-bit量化导致显著损失（最高达59%），且影响因模型、任务和语言而异，强调了在长上下文和多语言场景下谨慎应用量化的必要性。
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Published: 30 May, 2025 at 11:16 AM

87.82 🤔

本文通过实验验证了长上下文能力与推理性能的正相关，提出在监督微调前增强长上下文能力的训练策略，并在数学推理基准上显著提升了模型性能。
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Published: 17 May, 2025 at 11:22 PM

87.55 🤔

本文通过在softmax注意力机制的SDPA输出后引入头特定sigmoid门控机制，显著提升了15B MoE和1.7B密集模型的性能、训练稳定性和长上下文泛化能力，同时消除了注意力沉积现象。

Tag: Long Context

LoLA: Low-Rank Linear Attention With Sparse Caching

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Does quantization affect models' performance on long-context tasks?

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free