Tag: Long Context

All the articles with the tag "Long Context".

LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation

Published: 1 Jun, 2025 at 11:52 AM

90.28 🤔

本文提出LongReD方法，通过长文本训练、短文本蒸馏和短到长蒸馏的多目标训练策略，有效缓解了长上下文大语言模型在短文本任务上的性能下降，同时保持或提升长文本处理能力。
Why do LLMs attend to the first token?

Published: 17 May, 2025 at 11:04 AM

90.22 🤔

This paper argues that attention sinks in LLMs, particularly at the first token, are a useful mechanism to prevent over-mixing of information in deep Transformers, supported by theoretical insights and empirical evidence from Gemma 7B, LLaMa 3.1 models, and pre-training experiments showing stronger sinks with larger models and longer contexts.
M+: Extending MemoryLLM with Scalable Long-Term Memory

Published: 3 Jun, 2025 at 11:27 AM

90.20 🤔

M+通过引入长期记忆机制和协同训练的检索器，显著扩展了MemoryLLM的知识保留能力至超过160k token，并在长上下文任务中优于基线，同时保持较低GPU内存消耗。
LIFEBench: Evaluating Length Instruction Following in Large Language Models

Published: 25 May, 2025 at 11:47 AM

88.64 🤔

本文通过引入LIFEBENCH基准，系统评估了26个大型语言模型在长度指令遵循上的能力，发现其在长长度约束下普遍表现不佳，且远未达到厂商宣称的最大输出长度，揭示了模型在长度感知和长文本生成上的根本局限性。
Skywork Open Reasoner 1 Technical Report

Published: 3 Jun, 2025 at 11:44 AM

88.60 🤔

Skywork-OR1通过提出MAGIC框架，利用多阶段训练和自适应熵控制的强化学习方法，显著提升了长链式推理模型在数学和编码任务上的性能，并在AIME24和AIME25基准上超越了DeepSeek-R1和Qwen3-32B。

Tag: Long Context

LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation

Why do LLMs attend to the first token?

M+: Extending MemoryLLM with Scalable Long-Term Memory

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Skywork Open Reasoner 1 Technical Report