Posts
All the articles I've posted.
-
Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections
This paper introduces a thermal detector-based traffic light system using YOLO-Thermal, a modified YOLOv8 framework, to dynamically adjust signal timings for individuals with mobility restrictions, achieving superior detection accuracy (89.1% APval) and enhancing intersection accessibility while addressing privacy and adverse condition challenges.
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
-
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
本文通过模型融合方法整合快速思维和慢速推理能力,实现长到短推理,在7B模型上将响应长度压缩高达55%且保持性能,提出了一种高效解决大语言模型过度思考问题的方案。
-
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
PICD introduces a versatile perceptual image compression codec using diffusion rendering with three-tiered conditioning to achieve high text accuracy and visual quality for both screen and natural images, outperforming existing methods in key metrics like FID and text accuracy.
-
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
本文通过仅使用920个蒸馏样本对Qwen2.5-32B基础模型进行监督微调,显著超越了资源密集的Zero-RL方法,并揭示了蒸馏模型通过拟人化语言和高级认知行为实现更灵活推理的机制。