Tag: Efficiency

All the articles with the tag "Efficiency".

Intra-Layer Recurrence in Transformers for Language Modeling

Published: 7 May, 2025 at 12:12 AM

69.79 🤔

本文提出Intra-Layer Recurrence (ILR)方法，通过在Transformer单次前向传播中选择性循环特定层（尤其是早期层），在不增加参数量的情况下改善语言建模困惑度，但计算成本增加和大规模模型验证不足限制了其实用性。
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

Published: 9 May, 2025 at 11:10 AM

69.29 🤔

RWKVQuant introduces a tailored Post Training Quantization framework for RWKV models, using a coarse-to-fine proxy to hybridize scalar and vector quantization and optimizing codebooks for element-wise operations, achieving ~3-bit quantization with minimal accuracy loss and significant memory and speed improvements.
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Published: 4 May, 2025 at 04:33 PM

69.21 🤔

本文提出Mixture of Sparse Attention (MoSA)方法，通过专家选择路由实现基于内容的稀疏注意力，显著提高了Transformer模型在相同计算预算下的语言建模性能，并优化了资源使用。
Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Published: 4 May, 2025 at 04:28 PM

69.06 🤔

本文提出使用深度上下文蒸馏训练可插拔知识模块的方法，能够在低数据场景下高效整合文档知识，并通过实验证明其在问答任务中优于传统方法且与 RAG 具有协同效应。
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Published: 9 May, 2025 at 11:06 AM

68.33 🤔

This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.

Tag: Efficiency

Intra-Layer Recurrence in Transformers for Language Modeling

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning