Tag: Efficiency
All the articles with the tag "Efficiency".
-
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
本文提出MAET方法,通过提取语言无关的能力相关权重并跨语言转移,构建多语言能力增强的大型语言模型,在数学和科学任务上以60%的计算资源实现约10%的性能提升,优于多种基线方法。
-
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
本文提出LZ惩罚方法,基于LZ77压缩算法的码长变化动态调整自回归语言模型的采样分布,在贪婪解码下有效消除退化重复,同时保持推理基准性能。
-
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare
本文通过实证实验指导在医疗专业应用中语言模型的选择,强调微调小语言模型和领域特定预训练的显著优势,使其在特定任务上超越零-shot 大语言模型。
-
Compact Recurrent Transformer with Persistent Memory
This paper introduces the Compact Recurrent Transformer (CRT), which combines shallow Transformers with RNNs to efficiently process long sequences using a single persistent memory vector, achieving superior or comparable performance to full-length Transformers and Transformer-XL on language and video tasks with significantly reduced computational cost.
-
Less is More: Towards Green Code Large Language Models via Unified Structural Pruning
本文提出Flab-Pruner,一种结合词汇、层和FFN剪枝的统一结构剪枝方法,通过KL散度优化和自定义微调策略,在减少代码LLM参数的同时保持高性能和效率。