Tag: Multimodal Systems
All the articles with the tag "Multimodal Systems".
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
-
Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Selftok introduces a non-spatial autoregressive visual tokenizer using diffusion timesteps, unifying vision-language models and enabling effective reinforcement learning for superior text-to-image generation, as demonstrated on GenEval and DPG-Bench benchmarks.
-
Distilling LLM Agent into Small Models with Retrieval and Code Tools
本文提出Agent Distillation框架,通过将LLM代理的交互行为蒸馏到sLMs中,并结合first-thought prefix和self-consistent action generation方法,使小型模型在事实和数学推理任务上取得显著性能提升,接近甚至超越更大规模的CoT蒸馏模型。
-
Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories
This exploratory study evaluates GPT-4o's multilingual and multimodal performance on physics concept inventories, revealing strong results in English and text-based tasks but significant weaknesses in visual interpretation and non-Western languages, highlighting implications for equitable AI integration in education.
-
1bit-Merging: Dynamic Quantized Merging for Large Language Models
1bit-Merging提出了一种动态模型合并框架,通过1位量化任务向量和任务特定路由,在保持94.53%性能的同时将存储需求降至55.02%,在通用知识、数学推理和代码生成任务上优于传统和动态合并方法。