Tag: Diffusion Model
All the articles with the tag "Diffusion Model".
-
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
PICD introduces a versatile perceptual image compression codec using diffusion rendering with three-tiered conditioning to achieve high text accuracy and visual quality for both screen and natural images, outperforming existing methods in key metrics like FID and text accuracy.
-
Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models
TSAD-C introduces a pioneering unsupervised framework for multivariate time-series anomaly detection on contaminated data, using a Decontaminator with S4-based diffusion, long-range dependency modeling via a time-then-graph approach, and anomaly scoring, achieving state-of-the-art performance across diverse datasets.
-
Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Selftok introduces a non-spatial autoregressive visual tokenizer using diffusion timesteps, unifying vision-language models and enabling effective reinforcement learning for superior text-to-image generation, as demonstrated on GenEval and DPG-Bench benchmarks.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
-
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
本文提出基于扩散语言模型的文本嵌入方法DIFFEMBED,利用其双向注意力机制在长文档检索和推理密集型任务上显著优于自回归LLM嵌入模型,同时在传统嵌入任务上表现相当。