自动驾驶

共 56 篇论文

cs.CV自动驾驶CV

CounterScene: Counterfactual Causal Reasoning in Generative World Models for Safety-Critical Closed-Loop Evaluation

This paper presents CounterScene, a novel framework that enables structured counterfactual reasoning in generative Bird's Eye View (BEV) world models for safety-critical driving scenario generation. The approach addresses the challenge of creating realistic yet adversarially effective safety scenarios by introducing causal adversarial agent identification to determine critical agents and conflict types. CounterScene develops a conflict-aware interactive world model using causal interaction graphs to explicitly model dynamic inter-agent dependencies. The framework performs minimal interventions through stage-adaptive counterfactual guidance, effectively bridging the gap between realism and adversarial robustness in autonomous driving safety evaluation.

CounterScene Authors

28 days ago

arXiv 2603.21104v1

cs.CV自动驾驶

DGRNet: Disagreement-Guided Refinement for Uncertainty-Aware Brain Tumor Segmentation

Accurate brain tumor segmentation from MRI scans is critical for diagnosis and treatment planning. Despite the strong performance of recent deep learning approaches, two fundamental limitations remain: (1) the lack of reliable uncertainty quantification in single-model predictions, which is essential for clinical deployment because the level of uncertainty may impact treatment decision-making, and (2) the under-utilization of rich information in radiology reports that can guide segmentation in ambiguous regions. In this paper, we propose the Disagreement-Guided Refinement Network (DGRNet), a novel framework that addresses both limitations through multi-view disagreement-based uncertainty estimation and text-conditioned refinement. DGRNet generates diverse predictions via four lightweight view-specific adapters attached to a shared encoder-decoder, enabling efficient uncertainty quantification within a single forward pass. Afterward, we build disagreement maps to identify regions of high segmentation uncertainty, which are then selectively refined according to clinical reports. Moreover, we introduce a diversity-preserving training strategy that combines pairwise similarity penalties and gradient isolation to prevent view collapse. The experimental results on the TextBraTS dataset show that DGRNet favorably improves state-of-the-art segmentation accuracy by 2.4% and 11% in main metrics Dice and HD95, respectively, while providing meaningful uncertainty estimates.

Bahram Mohammadi +8

28 days ago

arXiv 2603.21086v1

cs.CV自动驾驶端到端

Single-Eye View: Monocular Real-time Perception Package for Autonomous Driving

This paper presents LRHPerception, a real-time monocular perception package designed specifically for autonomous driving applications. The system leverages monocular camera video to interpret the surrounding environment by combining the computational efficiency of end-to-end learning with detailed local mapping methodologies. It achieves comprehensive perception through integrated object tracking and prediction, road segmentation, and depth estimation capabilities. The unified framework processes monocular images into a five-channel tensor containing RGB, road segmentation, and pixel-level depth information, enhanced with object detection and trajectory prediction. Experimental results demonstrate impressive real-time performance at 29 FPS on a single GPU, representing a 555% speedup compared to baseline approaches.

LRHPerception Team

28 days ago

arXiv 2603.21061v1

cs.CV自动驾驶CV

A Two-stage Transformer Framework for Temporal Localization of Distracted Driver Behaviors

This paper presents a temporal action localization framework specifically designed for driver monitoring systems in autonomous driving applications. The framework employs a two-stage pipeline combining VideoMAE-based feature extraction with an Augmented Self-Mask Attention detector to identify hazardous driving behaviors from in-cabin video streams. A Spatial Pyramid Pooling-Fast module captures multi-scale temporal features for improved localization accuracy. The approach is optimized for transportation safety checkpoints and fleet management assessment systems, demonstrating a trade-off between model capacity and computational efficiency.

Anonymous Authors

28 days ago

arXiv 2603.21048v1

cs.CV大语言模型自动驾驶

KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph

This paper presents KLDrive, a novel knowledge-graph-augmented large language model reasoning framework specifically designed for fine-grained question answering in autonomous driving scenarios. The framework addresses critical challenges in autonomous driving perception by consolidating multi-source evidence through an energy-based scene fact construction module that builds reliable scene knowledge graphs. A specialized LLM agent performs fact-grounded reasoning over constrained action spaces using explicit structural constraints, combining structured prompting with few-shot in-context exemplars to adapt to diverse driving reasoning tasks. This approach tackles issues of unreliable scene facts, hallucinations, and opaque reasoning found in existing perception pipelines and driving-oriented LLM methods.

KLDrive Authors

28 days ago

arXiv 2603.21029v1

cs.AI自动驾驶Transformer

AutoMOOSE: An Agentic AI for Autonomous Phase-Field Simulation

This paper introduces AutoMOOSE, an open-source agentic framework designed to orchestrate the complete simulation lifecycle of multiphysics phase-field materials modeling through natural language prompts. The system employs a five-agent pipeline where an Input Writer coordinates six sub-agents, while a Reviewer autonomously diagnoses and corrects runtime failures without human intervention. A modular plugin architecture allows integration of new phase-field formulations, and a Model Context Protocol server exposes ten structured tools for external client interoperability. The framework is validated on a copper grain growth benchmark, demonstrating its ability to generate valid MOOSE simulation input files.

AutoMOOSE Authors

28 days ago

arXiv 2603.20986v1

cs.CV自动驾驶CV

OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation

This paper presents OmniPatch, a novel training framework designed to generate universal adversarial patches that can attack semantic segmentation models across different architectures including both Vision Transformers (ViT) and Convolutional Neural Networks (CNN). The approach addresses the critical challenge of black-box adversarial attacks in autonomous driving systems where target model parameters are unknown. By learning patches that generalize across images and architectures without requiring access to target models, this work provides a practical solution for evaluating robustness of deployed perception systems. The framework specifically targets semantic segmentation, which is essential for safe autonomous driving navigation.

Aarush Aggarwal +3

29 days ago

arXiv 2603.20777v1

cs.RO自动驾驶

ToFormer: Towards Large-scale Scenario Depth Completion for Lightweight ToF Camera

Time-of-Flight (ToF) cameras possess compact design and high measurement precision to be applied to various robot tasks. However, their limited sensing range restricts deployment in large-scale scenarios. Depth completion has emerged as a potential solution to expand the sensing range of ToF cameras, but existing research lacks dedicated datasets and struggles to generalize to ToF measurements. In this paper, we propose a full-stack framework that enables depth completion in large-scale scenarios for short-range ToF cameras. First, we construct a multi-sensor platform with a reconstruction-based pipeline to collect real-world ToF samples with dense large-scale ground truth, yielding the first LArge-ScalE scenaRio ToF depth completion dataset (LASER-ToF). Second, we propose a sensor-aware depth completion network that incorporates a novel 3D branch with a 3D-2D Joint Propagation Pooling (JPP) module and Multimodal Cross-Covariance Attention (MXCA), enabling effective modeling of long-range relationships and efficient 3D-2D fusion under non-uniform ToF depth sparsity. Moreover, our network can utilize the sparse point cloud from visual SLAM as a supplement to ToF depth to further improve prediction accuracy. Experiments show that our method achieves an 8.6% lower mean absolute error than the second-best method, while maintaining lightweight design to support onboard deployment. Finally, to verify the system's applicability on real robots, we deploy proposed method on a quadrotor at a 10Hz runtime, enabling reliable large-scale mapping and long-range planning in challenging environments for short-range ToF cameras.

Juncheng Chen +6

29 days ago

arXiv 2603.20669v1

cs.CV自动驾驶CV

GHOST: Ground-projected Hypotheses from Observed Structure-from-Motion Trajectories

We present a scalable self-supervised approach for segmenting feasible vehicle trajectories from monocular images for autonomous driving in complex urban environments. Our method leverages large-scale dashcam videos, treating recorded ego-vehicle motion as implicit supervision to recover camera trajectories via monocular structure-from-motion. These trajectories are projected onto the ground plane to generate spatial masks of traversed regions without manual annotation. We train a deep segmentation network that predicts motion-conditioned path proposals from a single RGB image at runtime, without explicit modeling of road or lane markings. The model implicitly captures scene layout, lane topology, and intersection structure, demonstrating generalization across varying camera configurations. We evaluate on NuScenes for reliable trajectory prediction and show transfer capability to an electric scooter platform.

Anonymous

29 days ago

arXiv 2603.20583v1

cs.RO自动驾驶地图构建

Multi-Robot Learning-Informed Task Planning Under Uncertainty

This paper addresses the challenge of coordinating multi-robot teams to complete complex tasks efficiently when task-relevant object locations are initially unknown. The proposed approach integrates learning-based estimation of uncertain environmental aspects with model-based planning to enable long-horizon coordination across 1, 2, and 3 robot teams. The method focuses on reasoning about likely object locations, evaluating individual action contributions to overall task progress, and dynamically coordinating team efforts under uncertainty. Experimental results demonstrate efficient multi-stage task planning performance compared to competitive baselines in large problem domains.

Anonymous Authors

30 days ago

arXiv 2603.20544v1

cs.RO自动驾驶地图构建

High-Speed, All-Terrain Autonomy: Ensuring Safety at the Limits of Mobility

This paper presents a novel local trajectory planner designed for autonomous off-road vehicles operating on rugged terrain at high speeds. The approach addresses critical safety challenges by developing a Model Predictive Control formulation with a new dynamics model specifically tailored for non-planar terrain. A key innovation is the energy-based constraint that enables safe extreme mobility scenarios, including tire liftoff without rollover, while preventing rollover events that current methods fail to mitigate. Real-time feasibility is achieved through parallelized GPGPU computation, allowing the system to perform complex trajectory optimization within operational time constraints. The planner's effectiveness is validated through both simulation and full-scale physical experiments, demonstrating safe and extreme trajectory generation for autonomous off-road vehicles.

University research team on autonomous off-road vehicles

30 days ago

arXiv 2603.20525v1

cs.CV自动驾驶CV

Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods

This paper addresses the challenge of sample-efficient ambiguous segmentation in uncertain environments such as wildfire spread, medical diagnosis, and autonomous driving. The authors evaluate several training-free sampling methods including particle guidance and SPELL, adapting them from natural image generation to discrete segmentation tasks. They also propose a novel clustering-based technique to encourage diverse predictions from diffusion models. Validation is performed on the LIDC medical dataset, a modified Cityscapes dataset for autonomous driving scenarios, and a new MMFire wildfire spread simulation dataset. The work demonstrates that training-free methods can effectively generate diverse plausible segmentation outcomes without additional training overhead.

Anonymous Authors

30 days ago

arXiv 2603.20188v1

cs.CV自动驾驶CV

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

IndoorR2X introduces a novel benchmark and simulation framework for LLM-driven multi-robot task planning in indoor environments. The system enables Robot-to-Everything communication by integrating observations from mobile robots and static IoT sensors like cameras to construct a global semantic state. This approach overcomes partial observability challenges that traditional R2R communication faces, reducing redundant exploration and enabling scalable scene understanding. The framework provides configurable simulation environments, sensor layouts, robot teams, and task suites for evaluating coordinated indoor robot operations.

IndoorR2X Authors

30 days ago

arXiv 2603.20182v1

cs.CV自动驾驶CV

The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning

This paper presents CRISP (Critique-and-Replan for Interactive Social Presence), an autonomous framework enabling robots to critique and replan their own social behaviors using a Vision-Language Model as a human-like social critic. The framework integrates joint and constraint extraction from robot description files, step-by-step behavior planning, low-level joint control code generation from visual information, VLM-based evaluation of social appropriateness, and iterative refinement through reward-based search. This approach enables robots to generate human-like, socially appropriate motions across various platforms with improved autonomy and naturalness.

Author 1 +1

30 days ago

arXiv 2603.20164v1

cs.CV自动驾驶端到端

X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

X-World is an action-conditioned multi-camera generative world model designed for scalable evaluation in end-to-end autonomous driving. The system generates realistic future observations by mapping synchronized multi-view camera history and future action sequences to video streams that accurately follow commanded driving actions. By simulating future multi-camera video outputs, X-World enables reproducible and controllable testing of vision-language-action (VLA) policies. The framework further supports optional control over dynamic traffic agents and static road elements, making it a comprehensive real-world simulator for autonomous vehicle development and validation.

X-World Team

30 days ago

arXiv 2603.19979v1

cs.CV自动驾驶CV

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Bird's-Eye-View (BEV) perception is fundamental for autonomous driving, providing a unified spatial representation that fuses surrounding-view images for downstream tasks like semantic segmentation, 3D object detection, and motion prediction. Current end-to-end BEV frameworks treat image-to-BEV transformation as a black box, lacking explicit 3D geometric understanding and often yielding suboptimal performance. This paper introduces Splat2BEV, a Gaussian Splatting-assisted framework that learns BEV feature representations combining semantic richness with geometric precision. By leveraging 3D Gaussian Splatting for reconstruction, the method explicitly incorporates 3D geometry awareness into the BEV perception pipeline, addressing the limitations of traditional end-to-end approaches.

Splat2BEV Authors

about 1 month ago

arXiv 2603.19193v1

上一页第 3 / 3 页