端到端

共 46 篇论文

cs.CV自动驾驶端到端

Rectify, Don't Regret: Avoiding Pitfalls of Differentiable Simulation in Trajectory Prediction

This paper addresses critical challenges in autonomous driving trajectory prediction where minor initial deviations in open-loop models cascade into compounding errors, leading to out-of-distribution states. The authors identify a shortcut learning problem in differentiable closed-loop simulators where gradients inadvertently leak future ground truth information into previous predictions, causing non-causal regret instead of genuine recovery. To solve this, they propose a detached receding horizon rollout that severs computation graphs between simulation steps, forcing the model to learn authentic reactive recovery behaviors from drifted states. Comprehensive evaluations on the nuScenes and DeepScenario autonomous driving datasets demonstrate the effectiveness of their approach in achieving genuine trajectory rectification without temporal information leakage.

Anonymous Authors

28 days ago

arXiv 2603.23393v1

cs.CL大语言模型端到端

Off-Policy Value-Based Reinforcement Learning for Large Language Models

This paper addresses the critical challenge of improving data utilization efficiency in reinforcement learning for large language models. The authors propose ReVal, a novel Bellman-update-based method that enables off-policy learning through combining stepwise consistency signals with trajectory-level outcome verification. By supporting replay-buffer-based training, ReVal allows efficient reuse of past trajectories, significantly improving sample efficiency compared to on-policy approaches. Experimental results on mathematical reasoning benchmarks, particularly with DeepSeek-R1-Distill-1.5B, demonstrate that ReVal achieves faster convergence and superior final performance, with improvements of 2.7% on AIME24.

ReVal Team

28 days ago

arXiv 2603.23355v1

cs.CV大语言模型端到端

End-to-End Training for Unified Tokenization and Latent Denoising

Latent diffusion models enable high-fidelity synthesis by operating in learned latent spaces. However, current approaches require complex multi-stage training where tokenizers must be trained separately before diffusion models. This paper proposes UNITE, an autoencoder architecture that unifies tokenization and latent diffusion through a single Generative Encoder with shared weights. The key innovation is treating tokenization and generation as the same latent inference problem under different conditioning regimes. The method introduces a single-stage training procedure that jointly optimizes both tokenization and generation tasks via two forward passes, enabling gradients to jointly shape the latent space for improved visual representation learning.

端到端

Rectify, Don't Regret: Avoiding Pitfalls of Differentiable Simulation in Trajectory Prediction

Off-Policy Value-Based Reinforcement Learning for Large Language Models

End-to-End Training for Unified Tokenization and Latent Denoising

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control

CayleyPy-4: AI-Holography. Towards analogs of holographic string dualities for AI tasks

Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation

Revisiting Quantum Code Generation: Where Should Domain Knowledge Live?

Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling

On the Failure of Topic-Matched Contrast Baselines in Multi-Directional Refusal Abliteration

Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch

Retrieving Climate Change Disinformation by Narrative

TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning

SecureBreak: A Dataset Towards Safe and Secure Models

Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe

SLURP-TN: Resource for Tunisian Dialect Spoken Language Understanding

P^2O: Joint Policy and Prompt Optimization

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection