Tag: Prediction
All the articles with the tag "Prediction".
-
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models
本文提出两种测试时计算扩展算法(淘汰赛式和联赛式),通过生成多个候选解决方案并进行成对比较,在理论上证明其失败概率随计算资源增加呈指数或幂律下降,并在多个数据集和模型上验证了性能提升。
-
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
COSMOS introduces a cost-effective framework to predict performance and cost of LLM adaptation strategies like QLoRA fine-tuning and retrieval-augmented ICL, achieving high accuracy (1.09% MAE) and reducing computational costs by 92.72% across eight diverse benchmarks.
-
Looped Transformers for Length Generalization
本文提出Looped Transformers方法,通过循环结构和自适应步数显著提升了Transformer在算法任务上的长度泛化能力,在多种任务中优于传统方法。
-
Compact Recurrent Transformer with Persistent Memory
This paper introduces the Compact Recurrent Transformer (CRT), which combines shallow Transformers with RNNs to efficiently process long sequences using a single persistent memory vector, achieving superior or comparable performance to full-length Transformers and Transformer-XL on language and video tasks with significantly reduced computational cost.
-
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
本研究提出 SpargeAttn,一种通用稀疏注意力机制,通过两阶段在线过滤器和量化技术加速各种模型的推理,同时保持端到端性能无损。