Announcement_8
New preprint: Learned Relay Representations for Forward-Thinking Discrete Diffusion Models! We train masked diffusion models to carry learned latent state across denoising steps, outperforming standard SFT on coding tasks while reducing inference latency by up to 32%. Code.