Tag: Prediction

All the articles with the tag "Prediction".

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Published: 21 May, 2025 at 11:29 AM

86.09 🤔

本文提出两种测试时计算扩展算法（淘汰赛式和联赛式），通过生成多个候选解决方案并进行成对比较，在理论上证明其失败概率随计算资源增加呈指数或幂律下降，并在多个数据集和模型上验证了性能提升。
COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Published: 10 May, 2025 at 11:05 AM

75.99 🤔

COSMOS introduces a cost-effective framework to predict performance and cost of LLM adaptation strategies like QLoRA fine-tuning and retrieval-augmented ICL, achieving high accuracy (1.09% MAE) and reducing computational costs by 92.72% across eight diverse benchmarks.
Looped Transformers for Length Generalization

Published: 7 May, 2025 at 08:42 AM

74.13 🤔

本文提出Looped Transformers方法，通过循环结构和自适应步数显著提升了Transformer在算法任务上的长度泛化能力，在多种任务中优于传统方法。
Compact Recurrent Transformer with Persistent Memory

Published: 9 May, 2025 at 11:06 AM

66.84 🤔

This paper introduces the Compact Recurrent Transformer (CRT), which combines shallow Transformers with RNNs to efficiently process long sequences using a single persistent memory vector, achieving superior or comparable performance to full-length Transformers and Transformer-XL on language and video tasks with significantly reduced computational cost.
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Published: 4 May, 2025 at 04:28 PM

59.39 🤔

本研究提出 SpargeAttn，一种通用稀疏注意力机制，通过两阶段在线过滤器和量化技术加速各种模型的推理，同时保持端到端性能无损。

Tag: Prediction

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Looped Transformers for Length Generalization

Compact Recurrent Transformer with Persistent Memory

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference