Making music with Moûsai

◀ Prev | 2025-10-20, access: $$$ Pro | Next ▶

Video applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms.

Click here to log in to your account, or here to sign up for a free account.