Tag: Transformer

All the articles with the tag "Transformer".

Looped Transformers for Length Generalization

Published: 7 May, 2025 at 08:42 AM

74.13 🤔

本文提出Looped Transformers方法，通过循环结构和自适应步数显著提升了Transformer在算法任务上的长度泛化能力，在多种任务中优于传统方法。
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Published: 9 May, 2025 at 11:06 AM

73.12 🤔

RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Published: 10 May, 2025 at 10:59 AM

72.65 🤔

This paper explores effective distillation of HuBERT for ASR by comparing student model structures, introducing a discriminative loss for improved low-resource performance, and proposing front-end distillation from waveform to Fbank features, achieving 17% parameter reduction and doubled inference speed with minor performance degradation.
Rethinking Invariance in In-context Learning

Published: 12 May, 2025 at 11:15 AM

71.91 🤔

This paper introduces Invariant In-Context Learning (InvICL), a novel ICL method that achieves permutation invariance, information non-leakage, and context interdependence using leave-one-out encoding and parallel implementation, outperforming both invariant and non-invariant baselines in generalization and performance across synthetic and real-world tasks.
Intra-Layer Recurrence in Transformers for Language Modeling

Published: 7 May, 2025 at 12:12 AM

69.79 🤔

本文提出Intra-Layer Recurrence (ILR)方法，通过在Transformer单次前向传播中选择性循环特定层（尤其是早期层），在不增加参数量的情况下改善语言建模困惑度，但计算成本增加和大规模模型验证不足限制了其实用性。

Tag: Transformer

Looped Transformers for Length Generalization

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Rethinking Invariance in In-context Learning

Intra-Layer Recurrence in Transformers for Language Modeling