Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Communicating Activations Between Language Model Agents
This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.
-
Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition
本文通过线性探查和神经元激活分析,复制并扩展了对密集检索模型中预训练与微调知识获取作用的研究,发现预训练知识在DPR模型中主导检索效果且微调导致知识分散,但此结论在不同架构(如Contriever、RepLlama)和表示策略下并不成立。
-
How much do language models memorize?
本文提出了一种基于信息论的记忆量化方法,通过区分无意记忆和泛化,测量GPT风格语言模型的容量约为每个参数3.6比特,并揭示了数据集规模与模型容量比对双重下降和成员推断性能的影响。
-
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
SEAL, a training-free method, calibrates the reasoning process of Large Language Models by steering latent representations to reduce redundant thoughts, achieving up to 14.1% accuracy improvement and 50.4% token reduction across diverse benchmarks.
-
Towards Complementary Knowledge Distillation for Efficient Dense Image Prediction
This paper introduces a Boundary and Context Distillation (BCD) method for efficient dense image prediction, enhancing compact models' boundary completeness and region connectivity through targeted knowledge transfer, achieving superior accuracy across multiple tasks and datasets without inference cost increase.