Tag: Efficiency
All the articles with the tag "Efficiency".
-   
MoM: Linear Sequence Modeling with Mixture-of-Memories
The Mixture-of-Memories (MoM) architecture introduces multiple independent memory states with a routing mechanism to enhance memory capacity and reduce interference in linear sequence modeling, achieving significant performance gains over other linear models on recall-intensive tasks and nearing Transformer performance at larger scales while maintaining efficiency.
 -   
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
本文提出逐层最优任务向量合并(LOT Merging)方法,通过最小化特征漂移优化模型合并过程,在视觉和视觉-语言任务上显著优于无训练基线方法,平均准确率提升高达4.4%。
 -   
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
本文提出了一种低秩引导的稀疏微调方法LIFT,通过低秩近似后选择主要权重进行微调,在推理任务上显著优于全参数微调和LoRA等方法,同时保持内存效率。
 -   
QKV Projections Require a Fraction of Their Memory
本文提出PAMM方法,通过随机选择代表性token近似输入张量,大幅减少注意力机制中Q、K、V投影的内存占用(高达512倍),同时在预训练和微调中基本维持模型性能。
 -   
Always Skip Attention
This paper theoretically demonstrates the ill-conditioning of Self-Attention Blocks in Vision Transformers without skip connections, highlights their role as regularizers, and proposes Token Graying (SVD and DCT) to improve input token conditioning, achieving modest performance gains in supervised and self-supervised tasks.