Stable Diffusion 3 API Now Available on Novita AI

Stable Diffusion 3 API Now Available on Novita AI

Gamechanger alert! Stable Diffusion 3 is here - and it's officially open-sourced by Stability AI.

Be among the first! Join Novita AI's waitlist for early access to Stable Diffusion 3 Medium model API to unleash your imagination.

Stable Diffusion 3 Medium Open Weights enables the creation of hyper-realistic, intricate visuals with unprecedented ease. The open sourcing of this model marks an exciting new era where the community can unlock the full potential of this game-changing generative AI tool.

In this blog, we'll provide a comprehensive introduction to Stable Diffusion 3, including its updated features and technical details. We'll also guide you on how to get the model and integrate it into your own projects. Let's dive in!

Introduction to Stable Diffusion 3

Stable Diffusion 3 (SD 3) has made great strides in image quality, prompt comprehension, and operational efficiency, making it a top pick for creating all sorts of images.

What is Stable Diffusion 3?

Stable Diffusion 3 is a series of advanced text-to-image models with varying parameter ranges from 800 million to 8 billion, designed to create detailed and realistic images based on user-generated text prompts. More parameters pump up the quality of the images it creates, but it costs more and takes longer to get. While, models with fewer parameters are perfect for quick, simple tasks. Stable Diffusion 3 is the latest iteration of AI image generation technology, making it a powerful tool for developers and content creators.

Key Features of Stable Diffusion 3

SD3 Medium is a 2 billion parameter SD3 model that offers some notable features:

  • Overall Quality and Photorealism: Delivers images with exceptional detail, color, and lighting, enabling photorealistic outputs as well as high-quality outputs in flexible styles. Success in addressing common pitfalls of other models, such as realism in hands and faces, is achieved through innovations such as the 16-channel VAE.
  • Prompt Understanding: Comprehends long and complex prompts involving spatial reasoning, compositional elements, actions, and styles. By utilizing all three text encoders or a combination, users can trade off performance for efficiency.
  • Typography: Achieves unprecedented text quality with fewer errors in spelling, kerning, letter forming, and spacing by leveraging our Diffusion Transformer architecture.
  • Resource-efficient: Ideal for running on standard consumer GPUs without performance degradation, thanks to its low VRAM footprint.
  • Fine-Tuning: Capable of absorbing nuanced details from small datasets, making it perfect for customisation.

What’s New in Stable Diffusion 3?

  • SD3 VS Midjourney: In comparison, SD3 tends to produce images with higher visual appeal, surpassing Midjourney in terms of visual aesthetics.
  • SD3 VS Dall-E-3: SD3 surpasses Dall-E-3 in terms of prompt following, as it can generate outputs that more accurately reflect the specified elements and themes.
  • SD3 VS SD1.5 and SDXL: SD3 demonstrates superior performance compared to SD1.5 and SDXL in terms of typography based on human evaluations.

Technologies Behind Stable Diffusion 3

Technical Details in Stable Diffusion 3

  • Diffusion Transformer (DiT) Architecture: The Diffusion Transformer (DiT) architecture is a class of diffusion models that utilizes transformer architecture for image generation. Unlike traditional approaches that rely on the U-Net backbone, DiTs operate on latent patches, allowing for the efficient and effective generation of high-quality images conditioned on textual input.
  • Flow Matching (FM) Technology: Flow Matching (FM) is a model training technique that redefines Continuous Normalizing Flows (CNFs) by focusing on regressing vector fields of fixed conditional probability paths. FM can provide a more stable alternative for training diffusion models — The paths are more efficient, training and sampling are faster, and generalization performance is enhanced.

How does Stable Diffusion 3 Work?

The SD3 architecture builds upon the DiT, however, for text-to-image generation, it should consider both modalities, text, and images. So, SD3 makes a new architecture called Multimodal Diffusion Transformer (MMDiT), which also uses pre-trained models to derive suitable text and image representations. It uses three different text embedders — two CLIP models and T5 — to encode text representations, and an improved autoencoding model to encode image tokens.

Stable Diffusion 3 employs a Rectified Flow (RF) formulation, where data and noise are connected on a linear trajectory during training. This results in straighter inference paths, which then allow sampling with fewer steps. Furthermore, in the training process, an innovative trajectory sampling schedule is introduced into SD3, which gives more weight to the middle parts of the trajectory. In contrast, the re-weighted RF variant consistently improves performance. Then, the reweighted RF formulation and MMDiT backbone are scaled, making the SD3 models transform from 15 blocks with 450M parameters to 38 blocks with 8B parameters.

How to Access Stable Diffusion 3 API?

Novita AI now supports the Stable Diffusion 3 Medium model, you can try it.

The API integration is currently in beta. You can join the waitlist for API access to integrate into your existing AI image generator and develop new features. Now, you can join the waitlist for early access.

Further Development of Stable Diffusion 3

Although SD3 is a cutting-edge AI technology, it has some limitations. As of now, the direct download option for the Stable Diffusion 3 model weights is not immediately available to the public, and direct self-hosting of Stable Diffusion 3 is not available. But in general, Stable Diffusion 3 still has a lot of space for development in the future, and we have high expectations for it.


In conclusion, Stable Diffusion 3 is a groundbreaking AI image generation model that offers significant improvements over its predecessors. With its robust capabilities and innovative features, Stable Diffusion 3 is not just a tool but a creative powerhouse that puts high-quality image creation at your fingertips. Creators of all backgrounds can harness the creative potential of AI-generated imagery and explore new frontiers in their artistic endeavors or business ventures.

Novita AI, the one-stop platform for limitless creativity that gives you access to 100+ APIs. From image generation and language processing to audio enhancement and video manipulation, cheap pay-as-you-go, it frees you from GPU maintenance hassles while building your own products. Try it for free.
Recommended reading
  1. Stable Diffusion API: A Comprehensive Guide
  2. Stable Diffusion Models for Anything V3
  3. Stable Diffusion Prompt: A Complete Guide