Tag: Multimodal Systems

All the articles with the tag "Multimodal Systems".

Meeseeks: An Iterative Benchmark Evaluating LLMs Multi-Turn Instruction-Following Ability

Published: 4 May, 2025 at 04:31 PM

53.12 🤔

本文提出Meeseeks多轮指令遵循基准，通过迭代反馈机制系统评估LLMs的自纠错能力，发现模型在多轮互动中性能显著提升。
Codenames as a Benchmark for Large Language Models

Published: 4 May, 2025 at 04:27 PM

77.18 👍

本论文提出使用Codenames游戏作为LLMs推理能力的基准，通过实验评估不同LLMs在语言理解、战略推理和合作方面的表现，展示了它们的独特行为和泛化潜力。
Humanity's Last Exam

Published: 4 May, 2025 at 04:28 PM

58.39 👍

本文引入HUMANITY'S LAST EXAM基准测试，通过专家创建的挑战性多模态问题，解决现有LLM基准饱和问题，评估模型在封闭式学术任务中的能力。
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Published: 8 May, 2025 at 10:22 AM

97.91 😐

Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models

Published: 8 May, 2025 at 12:17 AM

96.36 😐

本文提出了一种基于视觉-语言模型的定义引导提示技术和UnHateMeme框架，用于检测和缓解多模态模因中的仇恨内容，通过零样本和少样本提示实现高效检测，并生成非仇恨替代内容以保持图像-文本一致性，在实验中展现出显著效果。

Tag: Multimodal Systems

Meeseeks: An Iterative Benchmark Evaluating LLMs Multi-Turn Instruction-Following Ability

Codenames as a Benchmark for Large Language Models

Humanity's Last Exam

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models