Posts

All the articles I've posted.

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Published: 8 May, 2025 at 10:22 AM

89.20 🤔

The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
Always Skip Attention

Published: 8 May, 2025 at 11:06 AM

89.20 🤔

This paper theoretically demonstrates the ill-conditioning of Self-Attention Blocks in Vision Transformers without skip connections, highlights their role as regularizers, and proposes Token Graying (SVD and DCT) to improve input token conditioning, achieving modest performance gains in supervised and self-supervised tasks.
Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models

Published: 16 May, 2025 at 11:30 AM

89.11 🤔

This paper provides a theoretical analysis using Contextual Stochastic Block Models to demonstrate that graph attention mechanisms are beneficial for node classification only when structure noise exceeds feature noise, proposes a multi-layer GAT to achieve perfect classification at lower SNR thresholds, and validates these findings through synthetic and real-world experiments.
Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning

Published: 15 May, 2025 at 11:09 AM

89.11 🤔

This paper introduces the Objective Fairness Index (OFI), a legally grounded metric for evaluating bias in machine learning by comparing marginal benefits across groups, demonstrating its ability to detect algorithmic bias in applications like COMPAS and Folktable's Adult Employment dataset where traditional Disparate Impact fails.
Model Merging in Pre-training of Large Language Models

Published: 21 May, 2025 at 11:14 AM

89.09 🤔

本文提出预训练模型平均（PMA）策略，通过融合预训练阶段的检查点显著提升大型语言模型性能、预测退火效果并增强训练稳定性，为高效模型开发提供了新方法和实用指南。

Posts

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Always Skip Attention

Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models

Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning

Model Merging in Pre-training of Large Language Models