Tag: Efficiency
All the articles with the tag "Efficiency".
-
Toward Efficient Exploration by Large Language Model Agents
本文通过使用 LLMs 显式实现后验采样 RL 算法,显著提高了 LLMs 代理在自然语言环境中的探索效率,同时保留了经典算法的统计性能优势。
-
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing
本文提出MegaBeam-Mistral-7B,通过渐进式训练和系统优化,使7B参数模型实现512K token长上下文处理,在多个基准测试中展现出与更大模型相当的性能,但多事实推理能力仍需改进。
-
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
LSAQ introduces a novel Layer-Specific Adaptive Quantization system for LLMs, using Jaccard similarity to assess layer importance and dynamically adjusting quantization precision based on edge device resources, achieving superior accuracy on zero-shot tasks and lower perplexity compared to baseline methods while enabling efficient deployment.
-
Accelerating Large Language Model Reasoning via Speculative Search
Speculative Search (SpecSearch) accelerates LLM reasoning by up to 2.12× through a bi-level speculative thought generator that collaborates between small and large models, maintaining comparable reasoning quality via a quality-preserving rejection mechanism.
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。