Tag: Robustness
All the articles with the tag "Robustness".
-
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
本文通过构建模型特定数据集和信念操控实验,揭示了大型语言模型(LLMs)的撤回行为受内部信念因果影响,并通过监督微调显著提高撤回性能。
-
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant
This paper comprehensively evaluates the impact of four quantization methods (GPTQ, AWQ, SmoothQuant, FP8) on instruction-tuned LLMs and SLMs from 1B to 405B parameters across 13 datasets, revealing that quantized models often outperform smaller baselines but struggle with instruction-following and hallucination detection, with FP8 showing robustness and task difficulty not always correlating with accuracy loss.
-
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
本文通过理论和实验分析,提出模型集成方法通过平衡‘bias-variance’权衡有效缓解监督微调中的过适应问题,提升下游任务性能并减少预训练知识遗忘。
-
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
本文通过提出 PTQ-Bench 基准测试框架,系统评估了大型语言模型后训练量化(PTQ)策略的跨位宽、跨结构和跨模态鲁棒性,发现旋转型和补偿型策略在低位量化中表现优异,并提出极低位量化需重新审视及补偿型策略结合其他方法可显著提升鲁棒性的关键见解。
-
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization
本文通过控制实验、内部机制分析和理论推导,揭示了显式思维链(CoT)训练通过形成二阶段泛化电路显著提升大型语言模型的分布内(ID)和分布外(OOD)推理泛化能力,并验证了其在噪声数据下的鲁棒性。