The Scale Invariant Image Diffuser (SIID) is a new diffusion model architecture designed to overcome limitations in existing models like UNet and DiT, which struggle with changes in pixel density and resolution. SIID achieves this by using a dual relative positional embedding system that allows it to maintain image composition across varying resolutions and aspect ratios, while focusing on refining rather than adding information when more pixels are introduced. Trained on 64×64 MNIST images, SIID can generate readable 1024×1024 images with minimal deformities, demonstrating its ability to scale effectively without relying on data augmentation. This matters because it introduces a more flexible and efficient approach to image generation, potentially enhancing applications in fields requiring high-resolution image synthesis.
The development of the Scale Invariant Image Diffuser (SIID) offers a novel approach to image diffusion that addresses some of the limitations of existing architectures like UNet and DiT. Traditional models often struggle with scaling images due to their reliance on fixed convolution kernels and positional embeddings that do not adapt well to changes in pixel density. SIID tackles this by ensuring that the addition of more pixels refines the image rather than adding new information. This approach allows the model to upscale images significantly without losing the integrity of the original features, making it particularly useful for applications requiring high-resolution outputs from low-resolution inputs.
SIID’s architecture is designed to be efficient and flexible, utilizing a combination of pixel unshuffle for speed and a dual relative positional embedding system to maintain image composition across different resolutions and aspect ratios. This dual system helps the model understand both the composition of an image and its spatial features, allowing it to adapt dynamically to new resolutions and aspect ratios. The model’s ability to produce readable 1024×1024 images from a training set of 64×64 MNIST images demonstrates its potential to scale well beyond its training parameters, which is a significant advancement over existing models.
Why does this matter? In fields like digital art, medical imaging, and satellite imagery, the ability to generate high-resolution images from low-resolution data can be transformative. It allows for the production of detailed images without the need for extensive computational resources or large datasets, which are often expensive and time-consuming to obtain. Furthermore, the efficiency of SIID’s architecture, with only 25 million parameters, suggests that it could be deployed in environments with limited computational power, broadening its applicability.
While SIID shows promise, there are still areas for improvement, such as reducing artifacts at extreme resolutions and refining the learning rate scheduling to optimize performance. However, its ability to maintain image quality across a wide range of resolutions and aspect ratios without data augmentation is a testament to its robust design. As the model continues to be refined and tested, it could set a new standard for image diffusion models, offering a scalable and efficient solution for generating high-quality images from limited data. This innovation could pave the way for more accessible and versatile image processing technologies in various industries.
Read the original article here

![[P] SIID: A scale invariant pixel-space diffusion model; trained on 64x64 MNIST, generates readable 1024x1024 digits for arbitrary ratios with minimal deformities (25M parameters)](https://www.tweakedgeek.com/wp-content/uploads/2025/12/featured-article-6357-1024x585.png)