Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Tensor Product Attention Is All You Need
本文提出Tensor Product Attention (TPA),通过上下文相关的张量分解压缩KV缓存,显著减少推理内存占用,并在语言建模任务中优于或匹配MHA、MQA等基线性能。
-
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled
本文通过控制维度扩展发现,大型语言模型(LLMs)在预测人类阅读时间和脑成像数据时,随着模型规模增加,训练过程的贡献反而减少,揭示了模型与人类句子处理机制的潜在错位。
-
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
本文研究了口语语言模型(SLM)端到端训练中的灾难性遗忘问题,通过评估模型合并、LoRA缩放因子折扣和经验回放三种策略,发现经验回放最为有效,且结合其他方法可进一步提升性能。
-
Activation Space Interventions Can Be Transferred Between Large Language Models
This paper demonstrates that activation space interventions for AI safety, such as backdoor removal and refusal behavior, can be transferred between large language models using autoencoder mappings, enabling smaller models to align larger ones, though challenges remain in cross-architecture transfers and complex tasks like corrupted capabilities.
-
Next Token Perception Score: Analytical Assessment of your LLM Perception Skills
本文提出Next Token Perception Score (NTPS),一个量化自回归预训练与下游感知任务特征子空间对齐程度的度量方法,通过理论证明和实验验证其与线性探针性能的相关性,并展示其预测LoRA微调增益的实用性。