Posts
All the articles I've posted.
-
Radio: Rate-Distortion Optimization for Large Language Model Compression
This paper introduces 'Radio,' a rate-distortion optimization framework for LLM compression that outperforms existing quantization methods in perplexity and downstream task accuracy, particularly at lower bit depths, by iteratively optimizing bit depths and using companding quantization post-training.
-
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
本文提出patch级训练方法,通过将多个token聚合成高信息密度patch并分阶段训练大型语言模型,在训练成本减半的情况下保持甚至略提升模型性能。
-
Does Self-Attention Need Separate Weights in Transformers?
This paper introduces a shared weight self-attention mechanism for transformers, using a single weight matrix with diagonal scaling to reduce parameters by 66.53% in attention blocks, achieving competitive performance on GLUE and improved noise robustness while slightly underperforming on SQuAD tasks compared to standard BERT.
-
Toward Efficient Exploration by Large Language Model Agents
本文通过使用 LLMs 显式实现后验采样 RL 算法,显著提高了 LLMs 代理在自然语言环境中的探索效率,同时保留了经典算法的统计性能优势。
-
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
This paper introduces SEFE, a method combining Answer Style Diversification (ASD) to mitigate superficial forgetting and RegLoRA to address essential forgetting in Multimodal Continual Instruction Tuning, achieving state-of-the-art performance on the CoIN benchmark.