Tag: Direct Preference Optimization

All the articles with the tag "Direct Preference Optimization".

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance

Published: 2 Jun, 2025 at 11:32 AM

87.44 🤔

ExpandR通过联合优化大型语言模型和密集检索器，利用LLM生成语义丰富的查询扩展并结合DPO训练和对比学习，在多个检索基准上实现了超过5.8%的性能提升。
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?

Published: 2 Jun, 2025 at 11:32 AM

86.23 🤔

本文提出并验证了'浅层偏好信号'现象，通过截断偏好数据集（保留前40%-50% token）训练奖励模型和DPO模型，性能与完整数据集相当甚至更优，并揭示了当前对齐方法过于关注早期token的局限性。