By
Novita AI
/ July 30, 2025 / LLM / 7 minutes of reading
GLM 4.1V 9B Thinking is a breakthrough AI model because it can both see images and explain its reasoning step by step. This is called chain-of-thought (CoT) reasoning, and GLM does it better than any other vision-language model its size. You’ll see how it stacks up against bigger models and how you can try it out yourself, even if you don’t have an expensive GPU.
What Changes did GLM 4.1V 9B Thinking Make in VLM models?
As the world’s first VLM model with chain-of-thought (CoT) reasoning, GLM has not only impressed with its powerful capabilities, but also gained recognition for achieving performance comparable to Qwen 2.5 72B, despite being a much smaller 9B model. Next, let’s take a closer look at GLM’s detailed specifications and benchmark results.
GLM 4.1V 9B Thinking‘s Attribute
You can start a free trail on Hugging Face directly!
How Does GLM 4.1V 9B Thinking Achieve these Improvements?
SFT Enforcement: The training samples include explicit Chain‑of‑Thought (CoT) annotations, so the model learns to “think first, then answer.” This differs from traditional models that only output an answer without showing the reasoning steps.
RLCS Reinforcement: The model’s reward isn’t based solely on correctness—it also evaluates the quality of the reasoning process and explanations, encouraging more coherent and thorough internal thinking.
Architectural Support: A ViT vision encoder feeds into an MLP projector and then into the LLM decoder, enabling the model to seamlessly generate explicit reasoning paths from visual input, rather than mere retrieval or pattern matching.
Strong Reasoning Foundation: And another key factor: the base model—GLM‑4‑9B 0414—already possesses robust reasoning abilities. For instance:
It demonstrates excellent mathematical reasoning and general task performance, ranking at the top among open-source models at its scale.
Architecturally and training-wise, GLM‑4‑9B benefits from autoregressive blank‑infilling pre‑training and subsequent fine‑tuning that strengthen logical and multi-step reasoning skills.
Even more impressively, GLM 4.1V 9B Thinking has only 9 billion parameters, allowing it to run on GPUs like the RTX 4090 or even the 3090. Compared to other models with several times more parameters, GLM achieves outstanding performance with a much smaller size—an achievement that undoubtedly highlights the power of reinforcement learning.
If purchasing a GPU seems too costly, you can take advantage of Novita AI’s cost-effective and reliable cloud GPUs—such as the RTX 4090 for just $0.69 per hour or the RTX 3090 for only $0.21 per hour!
Browse through the available options and select the model that suits your needs.
Step 3: Start Your Free Trial
Begin your free trial to explore the capabilities of the selected model.
Step 4: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.
Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
Build a Simple Image Recognition Tool using MCP and GLM.
If you want to leverage the capabilities of GLM—such as building a simple image recognition tool to demonstrate its integration of visual recognition and reasoning—you can use the MCP functionality supported by Novita AI. Below is the sample code:
import os
import sys
from mcp.server.fastmcp import FastMCP
import requests
import uvicorn
from starlette.applications import Starlette
from starlette.routing import Mount
base_url = "https://api.novita.ai/v3"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.environ['NOVITA_API_KEY']}"
}
mcp = FastMCP("Novita_API")
@mcp.tool()
def list_models() -> str:
"""
List all available models from the Novita API.
"""
url = base_url + "/openai/models"
response = requests.request("GET", url, headers=headers)
data = response.json()["data"]
text = ""
for i, model in enumerate(data, start=1):
text += f"Model id: {model['id']}\n"
text += f"Model description: {model['description']}\n"
text += f"Model type: {model['model_type']}\n\n"
return text
@mcp.tool()
def get_model(model_id: str, message) -> str:
"""
Provide a model ID and a message to get a response from the Novita API.
"""
url = base_url + "/openai/chat/completions"
payload = {
"model": model_id,
"messages": [
{
"content": message,
"role": "user",
}
],
"max_tokens": 200,
"response_format": {
"type": "text",
},
}
response = requests.request("POST", url, json=payload, headers=headers)
content = response.json()["choices"][0]["message"]["content"]
return content
@mcp.tool()
def vision_chat(model_id: str, image_url: str, question: str) -> str:
"""
Use GLM-4.1V-9B-Thinking to answer a question about an image.
"""
url = base_url + "/openai/chat/completions"
payload = {
"model": model_id,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": image_url,
}
},
{
"type": "text",
"text": question,
}
]
}
],
"max_tokens": 500
}
response = requests.post(url, json=payload, headers=headers)
return response.json()["choices"][0]["message"]["content"]
if __name__ == "__main__":
# Run using stdio transport
mcp.run(transport="stdio")
GLM 4.1V 9B Thinking is easy to use, super smart, and doesn’t need fancy hardware. You can test its image recognition and reasoning abilities with just a few lines of code, either on your own machine or in the cloud. If you want to see how far multimodal AI has come, give GLM a try!
What’s special about GLM 4.1V 9B Thinking?
It’s the first model that can both “see” and show its reasoning steps, not just give answers.
Can I try GLM 4.1V 9B if I don’t have a powerful GPU?
Yes! You can use affordable cloud GPUs or try it for free on Novita AIPlayground.
How can I integrate GLM 4.1V 9B Thinking into my own projects?
You can run it locally using Python and Hugging Face, access it via Novita AI’s API, or even build your own image recognition tools using MCP as shown in the provided code samples.
Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.