大语言模型

共 53 篇论文

cs.SE大语言模型Transformer

Evaluating LLM-Based Test Generation Under Software Evolution

Large Language Models are increasingly being applied for automated unit test generation in software engineering. This paper presents a comprehensive empirical study evaluating the effectiveness of LLM-generated tests under program code evolution. Through a mutation-driven framework analyzing 22,374 program variants and 8 different LLMs, the researchers assess how generated tests respond to semantic-altering and semantic-preserving changes. The study reveals that while LLMs achieve strong baseline performance with 79% line coverage and 76% branch coverage on original programs, test quality degrades significantly during software evolution. This work provides critical insights into the limitations of current LLM-based testing approaches and highlights the need for more robust test generation methods that can adapt to code changes.

Research Team

26 days ago

arXiv 2603.23443v1

cs.CL大语言模型端到端

Off-Policy Value-Based Reinforcement Learning for Large Language Models

This paper addresses the critical challenge of improving data utilization efficiency in reinforcement learning for large language models. The authors propose ReVal, a novel Bellman-update-based method that enables off-policy learning through combining stepwise consistency signals with trajectory-level outcome verification. By supporting replay-buffer-based training, ReVal allows efficient reuse of past trajectories, significantly improving sample efficiency compared to on-policy approaches. Experimental results on mathematical reasoning benchmarks, particularly with DeepSeek-R1-Distill-1.5B, demonstrate that ReVal achieves faster convergence and superior final performance, with improvements of 2.7% on AIME24.

ReVal Team

26 days ago

arXiv 2603.23355v1

cs.CL大语言模型Transformer

MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

Large language model (LLM)-based agents rely on memory mechanisms to reuse knowledge from past problem-solving experiences. Existing approaches typically construct memory in a per-agent manner, tightly coupling stored knowledge to a single model's reasoning style. In modern deployments with heterogeneous agents, a natural question arises: can a single memory system be shared across different models? We found that naively transferring memory between agents often degrades performance, as such memory entangles task-relevant knowledge with agent-specific biases. To address this challenge, we propose MemCollab, a collaborative memory framework that constructs agent-agnostic memory by contrasting reasoning trajectories generated by different agents on the same task.

MemCollab Authors

26 days ago

arXiv 2603.23234v1

cs.CV大语言模型端到端

End-to-End Training for Unified Tokenization and Latent Denoising

Latent diffusion models enable high-fidelity synthesis by operating in learned latent spaces. However, current approaches require complex multi-stage training where tokenizers must be trained separately before diffusion models. This paper proposes UNITE, an autoencoder architecture that unifies tokenization and latent diffusion through a single Generative Encoder with shared weights. The key innovation is treating tokenization and generation as the same latent inference problem under different conditioning regimes. The method introduces a single-stage training procedure that jointly optimizes both tokenization and generation tasks via two forward passes, enabling gradients to jointly shape the latent space for improved visual representation learning.

大语言模型

Evaluating LLM-Based Test Generation Under Software Evolution

Off-Policy Value-Based Reinforcement Learning for Large Language Models

MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

End-to-End Training for Unified Tokenization and Latent Denoising

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

The Dual Mechanisms of Spatial Reasoning in Vision-Language Models

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

Greater accessibility can amplify discrimination in generative AI

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

MemDLM: Memory-Enhanced DLM Training

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Gumbel Distillation for Parallel Text Generation

Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models

SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection

Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs

CayleyPy-4: AI-Holography. Towards analogs of holographic string dualities for AI tasks

Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement

Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation