Paper Detail

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models稀疏自编码器揭示VLA模型中可解释且可操控的特征

cs.CVCVTransformer热门获取具身智能多模态

Anonymous Authors

2026年03月20日

arXiv: 2603.19183v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Vision-Language-Action (VLA) models have emerged as a promising approach for general-purpose robot manipulation, combining visual perception, language understanding, and action planning in a unified framework. This work applies mechanistic interpretability techniques using Sparse Autoencoders (SAEs) to analyze hidden layer activations in VLA models, revealing sparse dictionary features that provide interpretable bases for model computation. The research discovers that most SAE features correspond to memorized sequences from training demonstrations, while some features represent interpretable, generalizable motion primitives and semantic properties. This analysis offers insights into VLA model generalizability and provides a framework for steering model behavior through identified interpretable features, advancing the understanding of embodied AI systems for robot manipulation tasks.

摘要 / Abstract

分类 / Categories

深度分析