Posts

All the articles I've posted.

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Published: 4 Jun, 2025 at 11:28 AM

89.30 🤔

本文提出逐层最优任务向量合并（LOT Merging）方法，通过最小化特征漂移优化模型合并过程，在视觉和视觉-语言任务上显著优于无训练基线方法，平均准确率提升高达4.4%。
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Published: 23 May, 2025 at 11:13 AM

89.28 🤔

本文提出MEAP训练范式，通过在下一词预测中引入随机掩码策略，显著提升大型语言模型在关键信息检索和长上下文推理任务中的性能，同时保持计算效率和架构兼容性。
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Published: 26 May, 2025 at 11:24 AM

89.27 🤔

本文通过实验和理论分析揭示了RLVR提升大型语言模型准确性但不提升能力的原因在于其偏向优化简单问题，而蒸馏只有在引入新知识时才能提升能力，否则表现与RLVR类似。
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Published: 4 Jun, 2025 at 12:00 PM

89.25 🤔

本文提出了一种低秩引导的稀疏微调方法LIFT，通过低秩近似后选择主要权重进行微调，在推理任务上显著优于全参数微调和LoRA等方法，同时保持内存效率。
QKV Projections Require a Fraction of Their Memory

Published: 5 Jun, 2025 at 11:22 AM

89.22 🤔

本文提出PAMM方法，通过随机选择代表性token近似输入张量，大幅减少注意力机制中Q、K、V投影的内存占用（高达512倍），同时在预训练和微调中基本维持模型性能。

Posts

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

QKV Projections Require a Fraction of Their Memory