APT: Adaptive Personalized Training for Diffusion Models with Limited Data

Jungwoo Chae *1
Jiyoon Kim *1
Jaewoong Choi 1
Kyungyul Kim 1
Sangheum Hwang †2
1LG CNS AI Research
2Department of Data Science, Seoul National University of Science and Technology

*Equal contribution, Corresponding author

CVPR 2025 Main Conference

Given a few reference images, APT personalizes diffusion models with less overfitting: (Left) By comparing diffusion trajectories using the score matching loss, we observe that our method maintains the original denoising path. The predicted x0\mathbf{x}_0 images from APT closely resemble SDXL (prior) during early steps, preserving the overall layout and scene context. (Right) APT effectively incorporates contextual elements from the prior, such as generating a backpack with a person without explicitly mentioning person and preserves stylistic elements like comic book aesthetics. In contrast, other methods either focus excessively on reference images or fail to maintain the prior’s style. This demonstrates that APT successfully maintains the pretrained model’s capabilities for text alignment and stylization.

Abstract

Personalizing diffusion models using limited data presents significant challenges, including overfitting, loss of prior knowledge, and degradation of text alignment. Overfitting leads to shifts in the noise prediction distribution, disrupting the denoising trajectory and causing the model to lose semantic coherence. In this paper, we propose Adaptive Personalized Training, a novel framework that mitigates overfitting by employing adaptive training strategies and regularizing the model’s internal representations during fine-tuning. APT consists of three key components: (1) Adaptive Training Adjustment, which introduces an overfitting indicator to detect the degree of overfitting at each time step bin and applies adaptive data augmentation and adaptive loss weighting based on this indicator; (2) Representation Stabilization, which regularizes the mean and variance of intermediate feature maps to prevent excessive shifts in noise prediction; and (3) Attention Alignment for Prior Knowledge Preservation, which aligns the cross-attention maps of the fine-tuned model with those of the pretrained model to maintain prior knowledge and semantic coherence. Through extensive experiments, we demonstrate that APT effectively mitigates overfitting, preserves prior knowledge, and outperforms existing methods in generating high-quality, diverse images with limited reference data.

Results

BibTeX citation

    @InProceedings{Chae_2025_CVPR,
    author    = {Chae, JungWoo and Kim, Jiyoon and Choi, JaeWoong and Kim, Kyungyul and Hwang, Sangheum},
    title     = {APT: Adaptive Personalized Training for Diffusion Models with Limited Data},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {28619-28628}
}