Posts
All the articles I've posted.
-
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
本文通过系统性实验证明,纯强化学习(RL)训练不仅提升大型语言模型的复杂推理能力,还能隐式培养过程奖励模型(PRM)能力,提出Self-PRM框架以进一步改进性能,但也揭示了其在高难度问题上的低精度局限。
-
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
本文提出MergeBench,一个针对领域专精大型语言模型合并的全面基准测试框架,基于Llama和Gemma模型(2B-9B)评估八种合并方法,揭示了合并在大模型上的优越性、稀疏化和系数调整对知识保留的重要性,并提供了算法选择的实用指南。
-
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
MINGLE提出了一种测试时持续模型合并方法,通过混合低秩专家架构和自适应空空间约束门控,利用少量无标签测试样本动态融合模型,显著提升了持续学习中的泛化性能并减少了灾难性遗忘。
-
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
本文提出LongReD方法,通过长文本训练、短文本蒸馏和短到长蒸馏的多目标训练策略,有效缓解了长上下文大语言模型在短文本任务上的性能下降,同时保持或提升长文本处理能力。
-
Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets
This paper systematically evaluates machine learning classifiers and imbalance learning techniques on two cybersecurity datasets, revealing that XGB and RF perform robustly, while sampling and ensembling effects vary, emphasizing the need for dataset-specific method selection.