ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning
This paper presents ATG-MoE, an end-to-end autoregressive trajectory generation method with mixture-of-experts architecture for robot assembly skill learning from demonstration. The method processes multi-modal inputs including RGB-D observations, natural language instructions, and robot proprioception to generate manipulation trajectories in a closed-loop manner. It incorporates multi-modal feature fusion for comprehensive scene and task understanding, autoregressive sequence modeling for temporally coherent trajectory generation, and a mixture-of-experts architecture enabling unified multi-skill learning. The approach addresses challenges in flexible manufacturing where robot systems must adapt to changing tasks, objects, and environments without labor-intensive traditional programming.