Tag: Transformer
All the articles with the tag "Transformer".
-   
Looped Transformers for Length Generalization
本文提出Looped Transformers方法,通过循环结构和自适应步数显著提升了Transformer在算法任务上的长度泛化能力,在多种任务中优于传统方法。
 -   
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
 -   
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
This paper explores effective distillation of HuBERT for ASR by comparing student model structures, introducing a discriminative loss for improved low-resource performance, and proposing front-end distillation from waveform to Fbank features, achieving 17% parameter reduction and doubled inference speed with minor performance degradation.
 -   
Rethinking Invariance in In-context Learning
This paper introduces Invariant In-Context Learning (InvICL), a novel ICL method that achieves permutation invariance, information non-leakage, and context interdependence using leave-one-out encoding and parallel implementation, outperforming both invariant and non-invariant baselines in generalization and performance across synthetic and real-world tasks.
 -   
Intra-Layer Recurrence in Transformers for Language Modeling
本文提出Intra-Layer Recurrence (ILR)方法,通过在Transformer单次前向传播中选择性循环特定层(尤其是早期层),在不增加参数量的情况下改善语言建模困惑度,但计算成本增加和大规模模型验证不足限制了其实用性。