high-resolution images

  • SIID: Scale Invariant Image Diffusion Model


    [P] SIID: A scale invariant pixel-space diffusion model; trained on 64x64 MNIST, generates readable 1024x1024 digits for arbitrary ratios with minimal deformities (25M parameters)The Scale Invariant Image Diffuser (SIID) is a new diffusion model architecture designed to overcome limitations in existing models like UNet and DiT, which struggle with changes in pixel density and resolution. SIID achieves this by using a dual relative positional embedding system that allows it to maintain image composition across varying resolutions and aspect ratios, while focusing on refining rather than adding information when more pixels are introduced. Trained on 64×64 MNIST images, SIID can generate readable 1024×1024 images with minimal deformities, demonstrating its ability to scale effectively without relying on data augmentation. This matters because it introduces a more flexible and efficient approach to image generation, potentially enhancing applications in fields requiring high-resolution image synthesis.

    Read Full Article: SIID: Scale Invariant Image Diffusion Model