Nemotron 3 Nano 30B A3B is available on Novita AI as a Serverless LLM for OpenAI-compatible chat completions, with model ID nvidia/nemotron-3-nano-30b-a3b, a 256K context window, 32,768 max output tokens, text input/output, function calling, structured outputs, and reasoning listed on the Novita model page. As of June 11, 2026, Novita lists pricing at $0.05 per million input tokens and $0.20 per million output tokens, making it a practical option when you need long-context agent, coding, math, or tool-use workflows without managing model infrastructure.
Table Of Contents
- What is Nemotron 3 Nano 30B A3B?
- Novita AI availability and pricing
- When should developers use it?
- Quick start: call the Nemotron 3 Nano 30B A3B API
- Use function calling, structured outputs, and reasoning carefully
- FAQ
What is Nemotron 3 Nano 30B A3B?
Nemotron 3 Nano 30B A3B is an NVIDIA model listed in the Novita AI Nemotron 3 Nano 30B A3B model page as a compute-efficient, open-weight reasoning model for agentic AI. The page describes it as a Mixture-of-Experts model with 30B total parameters and 3.5B active parameters, using a hybrid Mamba-2 and Transformer architecture.
For developers, the key point is not just the architecture. It is that the model is exposed through Novita AI’s Serverless LLM API, so you can call it through the same OpenAI-compatible chat completion pattern used by other Novita language models.
| Field | Current value |
|---|---|
| Display name | Nemotron 3 Nano 30B A3B |
| API model ID | nvidia/nemotron-3-nano-30b-a3b |
| Provider / series shown by Novita | Nvidia |
| Category | LLM, Serverless |
| Endpoint | chat/completions |
| Input modalities | Text |
| Output modalities | Text |
| Context window | 256K tokens |
| Max output tokens | 32,768 |
| Listed feature flags | Serverless, function calling, structured outputs, reasoning |
| Quantization shown by Novita | fp4 |
This makes the model a fit for tasks where you need a large prompt budget, tool-use patterns, and JSON-shaped responses, but still want a hosted API rather than a self-managed deployment.
Novita AI availability and pricing
The model is currently listed as a NEW Serverless LLM on Novita AI. Use the exact model ID nvidia/nemotron-3-nano-30b-a3b in API calls.
As of June 11, 2026, Novita lists token pricing as:
| Token type | Price |
|---|---|
| Input tokens | $0.05 per 1M tokens |
| Output tokens | $0.20 per 1M tokens |
Pricing and availability can change, so production teams should check the Nemotron 3 Nano 30B A3B model page and the Novita AI pricing page before launch or procurement review.
Novita also exposes the model through an OpenAI-compatible API base URL:
https://api.novita.ai/openai
For chat completions, the endpoint path is:
POST https://api.novita.ai/openai/v1/chat/completions
Authentication uses a Bearer token in the Authorization header. Keep API keys in environment variables or your secret manager; do not hard-code them in application code.
When should developers use it?
Use Nemotron 3 Nano 30B A3B when your application needs long context, structured model output, or tool-use-oriented reasoning from a serverless text model.
Good evaluation cases include:
- Long-context agents that need to read larger project files, logs, transcripts, or knowledge-base chunks.
- Coding assistants that need enough context to inspect multiple files before generating a plan or patch.
- Math, planning, and multi-step analysis workflows where the model’s reasoning feature flag matters.
- Agent workflows that call tools through function calling.
- Data extraction tasks that need structured JSON responses instead of free-form prose.
Avoid assuming it is the best model for every task. For latency-sensitive short prompts, image or audio inputs, strict benchmark targets, or workloads with a known model preference, test it against your existing candidate set. The model page verifies the availability and feature flags; it does not replace your own evaluation on production prompts.
Quick start: call the Nemotron 3 Nano 30B A3B API
The simplest way to start is to call the OpenAI-compatible chat completions endpoint with the verified model ID.
cURL
export NOVITA_API_KEY="your_api_key"
curl "https://api.novita.ai/openai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${NOVITA_API_KEY}" \
-d '{
"model": "nvidia/nemotron-3-nano-30b-a3b",
"messages": [
{
"role": "system",
"content": "You are a concise technical assistant."
},
{
"role": "user",
"content": "Summarize the risks in this API migration plan and return three action items."
}
],
"max_tokens": 512,
"temperature": 0.2
}'
Python
If your application already uses the OpenAI Python SDK pattern, set the Novita OpenAI-compatible base URL and update the model name.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/openai",
api_key=os.environ["NOVITA_API_KEY"],
)
response = client.chat.completions.create(
model="nvidia/nemotron-3-nano-30b-a3b",
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{
"role": "user",
"content": "Summarize the risks in this API migration plan and return three action items.",
},
],
max_tokens=512,
temperature=0.2,
)
print(response.choices[0].message.content)
For implementation details, see the Novita AI LLM API guide and the chat completions API reference.
Use function calling, structured outputs, and reasoning carefully
Novita lists function calling, structured outputs, and reasoning among the model’s feature flags. These features are most useful when your application needs predictable interfaces between the model and the rest of your system.
For function calling, pass a tools array with function definitions. The chat completions API supports function tools with names, descriptions, JSON Schema parameters, and a strict option.
For structured outputs, use response_format with json_schema when the model and schema are supported. The API reference notes that strict structured outputs support a subset of JSON Schema, so test your exact schema before depending on it in production.
For reasoning behavior, keep the distinction between model-level availability and request-level behavior clear. The Nemotron model page lists reasoning as a feature flag, while the chat completions API reference documents request parameters such as separate_reasoning and enable_thinking with model-specific support notes. Before using reasoning fields in production, run a small API test with this exact model ID and capture the response shape your application will handle.
FAQ
Is Nemotron 3 Nano 30B A3B available on Novita AI?
Yes. The model is listed on Novita AI as a Serverless LLM with the model ID nvidia/nemotron-3-nano-30b-a3b.
What is the Nemotron 3 Nano 30B A3B context window?
Novita lists a 256K context window and 32,768 max output tokens for nvidia/nemotron-3-nano-30b-a3b.
How much does the Nemotron 3 Nano 30B A3B API cost on Novita AI?
As of June 11, 2026, Novita lists pricing at $0.05 per million input tokens and $0.20 per million output tokens.
Does the model support function calling and structured outputs?
The Novita model page lists function calling and structured outputs as feature flags for Nemotron 3 Nano 30B A3B. Validate your exact tool schema or JSON schema against the API before using it in production.
What endpoint should I use?
Use the OpenAI-compatible chat completions endpoint: https://api.novita.ai/openai/v1/chat/completions.
Recommended articles
- Qwen3 Coder Next API on Novita AI for Coding Agents
- How to Use Xiaomi MiMo-V2.5-Pro API on Novita AI
- MiniMax M3 API Key, Model IDs & OpenAI Endpoint
Discover more from Novita
Subscribe to get the latest posts sent to your email.





