Nemotron 3 Nano 30B A3B on Novita AI: Launch, Pricing, and Quick Start

Nemotron-3-Nano-30B-A3B Cover

Nemotron 3 Nano 30B A3B is available on Novita AI as a Serverless LLM for OpenAI-compatible chat completions, with model ID nvidia/nemotron-3-nano-30b-a3b, a 256K context window, 32,768 max output tokens, text input/output, function calling, structured outputs, and reasoning listed on the Novita model page. As of June 11, 2026, Novita lists pricing at $0.05 per million input tokens and $0.20 per million output tokens, making it a practical option when you need long-context agent, coding, math, or tool-use workflows without managing model infrastructure.

Table Of Contents

What is Nemotron 3 Nano 30B A3B?

Nemotron 3 Nano 30B A3B is an NVIDIA model listed in the Novita AI Nemotron 3 Nano 30B A3B model page as a compute-efficient, open-weight reasoning model for agentic AI. The page describes it as a Mixture-of-Experts model with 30B total parameters and 3.5B active parameters, using a hybrid Mamba-2 and Transformer architecture.

For developers, the key point is not just the architecture. It is that the model is exposed through Novita AI’s Serverless LLM API, so you can call it through the same OpenAI-compatible chat completion pattern used by other Novita language models.

FieldCurrent value
Display nameNemotron 3 Nano 30B A3B
API model IDnvidia/nemotron-3-nano-30b-a3b
Provider / series shown by NovitaNvidia
CategoryLLM, Serverless
Endpointchat/completions
Input modalitiesText
Output modalitiesText
Context window256K tokens
Max output tokens32,768
Listed feature flagsServerless, function calling, structured outputs, reasoning
Quantization shown by Novitafp4

This makes the model a fit for tasks where you need a large prompt budget, tool-use patterns, and JSON-shaped responses, but still want a hosted API rather than a self-managed deployment.

Novita AI availability and pricing

The model is currently listed as a NEW Serverless LLM on Novita AI. Use the exact model ID nvidia/nemotron-3-nano-30b-a3b in API calls.

As of June 11, 2026, Novita lists token pricing as:

Token typePrice
Input tokens$0.05 per 1M tokens
Output tokens$0.20 per 1M tokens

Pricing and availability can change, so production teams should check the Nemotron 3 Nano 30B A3B model page and the Novita AI pricing page before launch or procurement review.

Novita also exposes the model through an OpenAI-compatible API base URL:

https://api.novita.ai/openai

For chat completions, the endpoint path is:

POST https://api.novita.ai/openai/v1/chat/completions

Authentication uses a Bearer token in the Authorization header. Keep API keys in environment variables or your secret manager; do not hard-code them in application code.

When should developers use it?

Use Nemotron 3 Nano 30B A3B when your application needs long context, structured model output, or tool-use-oriented reasoning from a serverless text model.

Good evaluation cases include:

  • Long-context agents that need to read larger project files, logs, transcripts, or knowledge-base chunks.
  • Coding assistants that need enough context to inspect multiple files before generating a plan or patch.
  • Math, planning, and multi-step analysis workflows where the model’s reasoning feature flag matters.
  • Agent workflows that call tools through function calling.
  • Data extraction tasks that need structured JSON responses instead of free-form prose.

Avoid assuming it is the best model for every task. For latency-sensitive short prompts, image or audio inputs, strict benchmark targets, or workloads with a known model preference, test it against your existing candidate set. The model page verifies the availability and feature flags; it does not replace your own evaluation on production prompts.

Quick start: call the Nemotron 3 Nano 30B A3B API

The simplest way to start is to call the OpenAI-compatible chat completions endpoint with the verified model ID.

cURL

export NOVITA_API_KEY="your_api_key"

curl "https://api.novita.ai/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${NOVITA_API_KEY}" \
  -d '{
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [
      {
        "role": "system",
        "content": "You are a concise technical assistant."
      },
      {
        "role": "user",
        "content": "Summarize the risks in this API migration plan and return three action items."
      }
    ],
    "max_tokens": 512,
    "temperature": 0.2
  }'

Python

If your application already uses the OpenAI Python SDK pattern, set the Novita OpenAI-compatible base URL and update the model name.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key=os.environ["NOVITA_API_KEY"],
)

response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {
            "role": "user",
            "content": "Summarize the risks in this API migration plan and return three action items.",
        },
    ],
    max_tokens=512,
    temperature=0.2,
)

print(response.choices[0].message.content)

For implementation details, see the Novita AI LLM API guide and the chat completions API reference.

Use function calling, structured outputs, and reasoning carefully

Novita lists function calling, structured outputs, and reasoning among the model’s feature flags. These features are most useful when your application needs predictable interfaces between the model and the rest of your system.

For function calling, pass a tools array with function definitions. The chat completions API supports function tools with names, descriptions, JSON Schema parameters, and a strict option.

For structured outputs, use response_format with json_schema when the model and schema are supported. The API reference notes that strict structured outputs support a subset of JSON Schema, so test your exact schema before depending on it in production.

For reasoning behavior, keep the distinction between model-level availability and request-level behavior clear. The Nemotron model page lists reasoning as a feature flag, while the chat completions API reference documents request parameters such as separate_reasoning and enable_thinking with model-specific support notes. Before using reasoning fields in production, run a small API test with this exact model ID and capture the response shape your application will handle.

FAQ

Is Nemotron 3 Nano 30B A3B available on Novita AI?

Yes. The model is listed on Novita AI as a Serverless LLM with the model ID nvidia/nemotron-3-nano-30b-a3b.

What is the Nemotron 3 Nano 30B A3B context window?

Novita lists a 256K context window and 32,768 max output tokens for nvidia/nemotron-3-nano-30b-a3b.

How much does the Nemotron 3 Nano 30B A3B API cost on Novita AI?

As of June 11, 2026, Novita lists pricing at $0.05 per million input tokens and $0.20 per million output tokens.

Does the model support function calling and structured outputs?

The Novita model page lists function calling and structured outputs as feature flags for Nemotron 3 Nano 30B A3B. Validate your exact tool schema or JSON schema against the API before using it in production.

What endpoint should I use?

Use the OpenAI-compatible chat completions endpoint: https://api.novita.ai/openai/v1/chat/completions.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading