Tag: Efficiency

All the articles with the tag "Efficiency".

Large Language Model Compression with Global Rank and Sparsity Optimization

Published: 11 May, 2025 at 11:14 AM

77.26 🤔

This paper introduces a two-stage LLM compression method using RPCA for low-rank and sparse decomposition and probabilistic pruning via policy gradient, outperforming state-of-the-art techniques at a 50% compression ratio while automatically adapting to layer-wise redundancy without manual thresholds or extensive fine-tuning.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Published: 4 May, 2025 at 04:26 PM

76.52 🤔

本文首次系统调查了大型语言模型高效推理的进展，通过分类模型、输出和提示-based方法，探讨了减少"过度思考"现象的策略，以优化计算效率并保持推理能力。
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?

Published: 12 May, 2025 at 11:20 AM

76.16 🤔

This paper introduces a framework to classify algorithmic innovations in LLMs as compute-dependent or compute-independent, demonstrating through small-scale GPT-2 experiments that compute-independent advancements like FlashAttention can yield up to 3.5× compute-equivalent gains even under hardware constraints, challenging the efficacy of hardware-focused AI regulation.
COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Published: 10 May, 2025 at 11:05 AM

75.99 🤔

COSMOS introduces a cost-effective framework to predict performance and cost of LLM adaptation strategies like QLoRA fine-tuning and retrieval-augmented ICL, achieving high accuracy (1.09% MAE) and reducing computational costs by 92.72% across eight diverse benchmarks.
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

Published: 4 May, 2025 at 04:33 PM

75.71 🤔

本文提出光谱字典生成模型（SDGM），通过学习全局傅里叶字典和 token 混合系数替换自注意力机制，实现 O(KL) 复杂度的高效语言建模，并在基准数据集上取得竞争性 perplexity 和显著的资源节省。

Tag: Efficiency

Large Language Model Compression with Global Rank and Sparsity Optimization

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models