Skip to content

Conversation

HighCWu
Copy link
Contributor

@HighCWu HighCWu commented Jan 13, 2023

PR types

New features

PR changes

Models

Describe

Add diffusion module for training diffsinger.

Use normalized mel-spec as GaussianDiffusion forward input, then you will get noised normalized mel-spec and the target noise added in. Compute l1 loss or mse loss of the noised mel-spec and the target noise with non-padding weights for denoising model training.

Simple example for use the modules:

>>> import paddle
>>> import paddle.nn.functional as F
>>> from tqdm import tqdm
>>> 
>>> denoiser = WaveNetDenoiser()
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=1000, num_max_timesteps=100)
>>> x = paddle.ones([4, 80, 192]) # [B, mel_ch, T] # real mel input
>>> c = paddle.randn([4, 256, 192]) # [B, fs2_encoder_out_ch, T] # fastspeech2 encoder output
>>> loss = F.mse_loss(*diffusion(x, c))
>>> loss.backward()
>>> print('MSE Loss:', loss.item())
MSE Loss: 1.6669728755950928 
>>> def create_progress_callback():
>>>     pbar = None
>>>     def callback(index, timestep, num_timesteps, sample):
>>>         nonlocal pbar
>>>         if pbar is None:
>>>             pbar = tqdm(total=num_timesteps-index)
>>>         pbar.update()
>>> 
>>>     return callback
>>> 
>>> # ds=1000, K_step=60, scheduler=ddpm, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 1000
>>> K_step = 60
>>> scheduler_type = 'ddpm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 60/60 [00:03<00:00, 18.36it/s] 
>>> 
>>> # ds=100, K_step=100, scheduler=ddpm, from gaussian noise
>>> ds = 100
>>> infer_steps = 100
>>> K_step = 100
>>> scheduler_type = 'ddpm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x_in, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 100/100 [00:05<00:00, 18.29it/s] 
>>> 
>>> # ds=1000, K_step=1000, scheduler=pndm, infer_step=25, from gaussian noise
>>> ds = 1000
>>> infer_steps = 25
>>> K_step = 1000
>>> scheduler_type = 'pndm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, None, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 25/25 [00:01<00:00, 19.75it/s]
>>> 
>>> # ds=1000, K_step=100, scheduler=pndm, infer_step=50, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 50
>>> K_step = 100
>>> scheduler_type = 'pndm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 5/5 [00:00<00:00, 23.80it/s]

The method to compute loss and training denoising model can refer to train_text_to_image.py.
The inference process is refer to pipeline_stable_diffusion_img2img.py.
To implement GaussianDiffusionShallow in diffsinger, you should modify the num_max_timesteps to the K_step during training to skip time steps that do not require training.

@yt605155624 yt605155624 added this to the r1.4.0 milestone Jan 13, 2023
@yt605155624 yt605155624 changed the title add diffusion module for training diffsinger [TTS]add diffusion module for training diffsinger Jan 13, 2023
@yt605155624 yt605155624 requested a review from lym0302 January 13, 2023 10:21
@yt605155624 yt605155624 mentioned this pull request Jan 13, 2023
Copy link
Collaborator

@yt605155624 yt605155624 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yt605155624 yt605155624 merged commit 57b9d4b into PaddlePaddle:develop Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants