Tech
Interrupting encoder training in diffusion models enables more efficient generative AI
A new framework for generative diffusion models was developed by researchers at Science Tokyo, significantly improving generative AI models. The method reinterpreted Schrödinger bridge models as variational autoencoders with infinitely many latent variables, reducing computational costs and preventing overfitting. By appropriately interrupting the training of the encoder, this approach enabled development of more efficient generative AI, with broad applicability beyond standard diffusion models.
Diffusion models are among the most widely used approaches in generative AI for creating images and audio. These models generate new data by gradually adding noise (noising) to real samples and then learning how to reverse that process (denoising) back into realistic data. A widely used version, the score-based model, achieves this by the diffusion process connecting the prior to the data with a sufficiently long-time interval. This method, however, has a limitation that when the data differs strongly from the prior, the time intervals of the noising and denoising processes become longer, which causes slowing down sample generation.
Now, a research team from Institute of Science Tokyo (Science Tokyo), Japan, has proposed a new framework for diffusion models that is faster and computationally less demanding. They achieved this by reinterpreting Schrödinger bridge (SB) models, a type of diffusion model, as variational autoencoders (VAEs).
The study was led by graduate student Mr. Kentaro Kaba and Professor Masayuki Ohzeki from the Department of Physics at Science Tokyo, in collaboration with Mr. Reo Shimizu (then a graduate student) and Associate Professor Yuki Sugiyama from the Graduate School of Information Sciences at Tohoku University, Japan. Their findings were published in the Physical Review Research on September 3, 2025.
SB models offer greater flexibility than standard score-based models because they can connect any two probability distributions over a finite time using a stochastic differential equation (SDE). This supports more complex noising processes and higher-quality sample generation. The trade-off, however, is that SB models are mathematically complex and expensive to train.
The proposed method addresses this by reformulating SB models as VAEs with multiple latent variables. “The key insight lies in extending the number of latent variables from one to infinity, leveraging the data-processing inequality. This perspective enables us to interpret SB-type models within the framework of VAEs,” says Kaba.
In this setup, the encoder represents the forward process that maps real data onto a noisy latent space, while the decoder reverses the process to reconstruct realistic samples, and both processes are modeled as SDEs learned by neural networks.
The model employs a training objective with two components. The first is the prior loss, which ensures that the encoder correctly maps the data distribution to the prior distribution. The second is drift matching, which trains the decoder to mimic the dynamics of the reverse encoder process. Moreover, once the prior loss stabilizes, encoder training can be stopped early. This allows us to complete learning faster, reducing the risk of overfitting and preserving high accuracy in SB models.
“The objective function is composed of the prior loss and drift matching parts, which characterizes the training of neural networks in the encoder and the decoder, respectively. Together, they reduce the computational cost of training SB-type models. It was demonstrated that interrupting the training of the encoder mitigated the challenge of overfitting,” explains Ohzeki.
This approach is flexible and can be applied to other probabilistic rule sets, even non-Markov processes, making it a broadly applicable training scheme.
More information:
Kentaro Kaba et al, Schrödinger bridge-type diffusion models as an extension of variational autoencoders, Physical Review Research (2025). DOI: 10.1103/dxp7-4hby
Citation:
Interrupting encoder training in diffusion models enables more efficient generative AI (2025, September 29)
retrieved 29 September 2025
from https://techxplore.com/news/2025-09-encoder-diffusion-enables-efficient-generative.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.