Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
SoftCoT++ 通过在连续潜在空间中引入多样化初始令牌和对比学习实现测试时扩展,显著提升了大型语言模型在多个推理任务上的性能,并与传统离散空间扩展方法展现出协同效应。
-
ATLAS: Learning to Optimally Memorize the Context at Test Time
本文提出Atlas,一种高容量长期内存模块,通过滑动窗口Omega规则和Muon优化器优化上下文记忆,在语言建模和长上下文理解任务中显著优于Transformer和现代RNN。
-
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
本文通过对比实验揭示,尽管长序列模型(如Mamba2)理论上支持无限长上下文,但在实际长上下文任务中与Transformer模型一样面临显著局限,尤其在信息位置和数据格式变化时表现不佳,亟需进一步研究其原因。
-
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.
-
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
本文通过创新任务设计和Pythia模型训练检查点分析,揭示上下文学习(ICL)在大型语言模型中既非纯记忆也非符号算法,而是依赖统计特性的有限泛化能力,并探讨了其训练动态和内部机制联系。