Deploy GLM-Image on Novita AI GPU Instance: Complete Setup Guide

Deploy GLM-Image on Novita AI GPU Instance

GLM-Image represents a significant advancement in AI image generation, combining autoregressive and diffusion architectures to push the boundaries of visual fidelity. For developers and businesses building AI-powered visual applications, deploying this model efficiently is crucial—but traditional setup processes involve complex dependencies, environment configuration, and infrastructure management.

Novita AI eliminates these barriers with pre-configured GPU templates that let you deploy GLM-Image in minutes rather than hours. This guide walks you through the complete deployment process, from template selection to running your first inference, plus optimization strategies for production workloads.

Whether you’re prototyping a content generation tool, building an e-commerce visualization platform, or integrating advanced image synthesis into your application, this tutorial provides everything you need to get GLM-Image running on enterprise-grade GPU infrastructure.

What is GLM-Image?

GLM-Image is an advanced image generation model that combines autoregressive and diffusion decoder architectures to deliver exceptional visual quality and fine-grained detail rendering. Developed by the ZAI team, this hybrid approach positions GLM-Image as a powerful alternative to traditional Latent Diffusion Models (LDM), particularly excelling in knowledge-intensive image generation scenarios.

The model’s unique architecture enables it to generate highly detailed images while maintaining competitive performance with industry-standard approaches. Whether you’re building AI-powered design tools, content creation platforms, or visual synthesis applications, GLM-Image offers both flexibility and precision through its text-to-image and image-to-image generation capabilities.

Key capabilities:

  • Hybrid autoregressive + diffusion architecture for superior visual fidelity
  • Text-to-image generation with detailed prompt understanding
  • Image-to-image transformation and style transfer
  • Multi-image conditional generation
  • High-resolution output support (customizable dimensions)

For complete technical specifications and model documentation, visit the official GLM-Image repository.

Why Deploy GLM-Image on Novita AI?

Novita AI’s GPU infrastructure provides the ideal environment for running GLM-Image with pre-configured templates, instant deployment, and scalable computing resources. Unlike setting up local environments or managing cloud instances manually, Novita AI streamlines the entire deployment process from template selection to running inference.

Step-by-Step Deployment Guide

Step 1: Access GPU Console

Navigate to Novita AI’s GPU interface and click Get Started to enter the deployment management dashboard.

Access GPU Console

Step 2: Select GLM-Image Template

Locate GLM-Image in the template repository. Novita AI’s pre-built template includes all necessary dependencies, eliminating complex environment setup.

Select GLM-Image Template

Step 3: Configure Infrastructure

Set your computing parameters:

  • Memory allocation: Ensure sufficient VRAM for model weights
  • Storage requirements: Allocate space for model files and generated images
  • Network settings: Configure according to your access requirements

Click Deploy to proceed with your configuration.

Configure Infrastructure

Step 4: Review Configuration

Double-check your setup details and cost summary. When satisfied, click Deploy to initiate instance creation.

Review Configuration

Step 5: Monitor Deployment

The system automatically redirects you to the instance management page. Your GLM-Image instance will be created in the background—no manual intervention required.

Monitor Deployment

Step 6: Track Download Progress

Track Download Progress

Monitor the model download status in real-time. Your instance status will transition from Pulling to Running once deployment completes. Click the arrow icon next to your instance name for detailed progress information.

Step 7: Verify Service Status

Click the Logs button to view instance logs and confirm that the GLM-Image service has started successfully. Look for initialization confirmation messages indicating the model is ready for inference.

Verify Service Status

How to get started

example text2image.py

import torch                                                                                                                                                                  
from diffusers.pipelines.glm_image import GlmImagePipeline                                                                                                                                                                                                                                                                                                                                                
pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image", torch_dtype=torch.bfloat16, device_map="cuda")                                                                                                       
prompt = "A beautifully designed modern food magazine style dessert recipe illustration, themed around a raspberry mousse cake. The overall layout is clean and bright, divided into four main areas: the top left features a bold black title 'Raspberry Mousse Cake Recipe Guide', with a soft-lit close-up photo of the finished cake on the right, showcasing a light pink cake adorned with fresh raspberries and mint leaves; the bottom left contains an ingredient list section, titled 'Ingredients' in a simple font, listing 'Flour 150g', 'Eggs 3', 'Sugar 120g', 'Raspberry puree 200g', 'Gelatin sheets 10g', 'Whipping cream 300ml', and 'Fresh raspberries', each accompanied by minimalist line icons (like a flour bag, eggs, sugar jar, etc.); the bottom right displays four equally sized step boxes, each containing high-definition macro photos and corresponding instructions, arranged from top to bottom as follows: Step 1 shows a whisk whipping white foam (with the instruction 'Whip egg whites to stiff peaks'), Step 2 shows a red-and-white mixture being folded with a spatula (with the instruction 'Gently fold in the puree and batter'), Step 3 shows pink liquid being poured into a round mold (with the instruction 'Pour into mold and chill for 4 hours'), Step 4 shows the finished cake decorated with raspberries and mint leaves (with the instruction 'Decorate with raspberries and mint'); a light brown information bar runs along the bottom edge, with icons on the left representing 'Preparation time: 30 minutes', 'Cooking time: 20 minutes', and 'Servings: 8'. The overall color scheme is dominated by creamy white and light pink, with a subtle paper texture in the background, featuring compact and orderly text and image layout with clear information hierarchy."                                                                                                                           
image = pipe(                                                                                                                                                                                
prompt=prompt,                                                                                                                                                                   
height=32 * 32,                                                                                                                                                                             
width=36 * 32,                                                                                                                                                                      
num_inference_steps=30,                                                                                                                                                       
guidance_scale=1.5,                                                                                                                                                                               
generator=torch.Generator(device="cuda").manual_seed(42),                                                                                                                                                     
).images[0]                                                                                                                                                                                                                                                                  
image.save("output_t2i.png")

You can modify the prompt in text2image.py to run or use existing examples to run directly.

python3 text2image

$ python3 text2image.py                                                                                                                                                             
Couldn't connect to the Hub: Cannot reach https://huggingface.co/api/models/zai-org/GLM-Image: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable..                   
Will try to load from local cache.                                                                                                                                                                                
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.47it/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 1391.52it/s, Materializing param=shared.weight]
Loading pipeline components...:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:02<00:00,  2.91it/s]Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.                                                                                                 
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1011/1011 [00:02<00:00, 359.59it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.02s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:11<00:00,  2.69it/s]

output_t2i.png图片

example image2image.py

import torch                                                                                                            
from diffusers.pipelines.glm_image import GlmImagePipeline                                                              
from PIL import Image                                                                                                   
                                                                                                                        
pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image", torch_dtype=torch.bfloat16, device_map="cuda")             
image_path = "cond.jpg"                                                                                                 
prompt = "Replace the background of the snow forest with an underground station featuring an automatic escalator."      
image = Image.open(image_path).convert("RGB")                                                                           
image = pipe(                                                                                                           
    prompt=prompt,                                                                                                      
    image=[image],  # can input multiple images for multi-image-to-image generation such as [image, image1]             
    height=33 * 32,                                                                                                     
    width=32 * 32,                                                                                                      
    num_inference_steps=30,                                                                                             
    guidance_scale=1.5,                                                                                                 
    generator=torch.Generator(device="cuda").manual_seed(42),                                                           
).images[0]                                                                                                             
                                                                                                                        
image.save("output_i2i.png") 

You can modify the prompt and image in text2image.py to run or use existing examples to run directly.

$ python3 image2image.py         

Couldn't connect to the Hub: Cannot reach https://huggingface.co/api/models/zai-org/GLM-Image: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable..                   
Will try to load from local cache.                                                                                                                                                                                
Loading pipeline components...:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.                                                                                                 
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1011/1011 [00:02<00:00, 360.88it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.46it/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 1426.62it/s, Materializing param=shared.weight]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.03s/it]
100%|███████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00,  2.97it/s]

Use Cases for GLM-Image

GLM-Image’s hybrid architecture makes it particularly effective for:

  • E-commerce: Product visualization and variant generation
  • Marketing: Campaign creative and social media content
  • Publishing: Editorial illustrations and infographics
  • Design: Concept art and visual prototyping
  • Education: Instructional diagrams and visual learning materials

The model’s strength in knowledge-intensive generation means it can accurately render complex scenes with specific requirements—ideal for projects demanding both creativity and precision.

Get Started with GLM-Image on Novita AI

Deploy GLM-Image on Novita AI’s GPU infrastructure today and access enterprise-grade image generation capabilities without the complexity of manual setup. Visit the GLM-Image template page to begin your deployment.

Novita AI is a leading AI cloud platform that provides developers with easy-to-use APIs and affordable, reliable GPU infrastructure for building and scaling AI applications.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading