GPT OSS on Novita AI: Access OpenAI's Open-Source Models via API

Novita AI is excited to announce that GPT OSS – OpenAI’s groundbreaking open-weight language models – are now available through our inference API. The GPT OSS family comprises two state-of-the-art reasoning models: gpt-oss-120b and gpt-oss-20b. Both are released under the Apache 2.0 license.

According to OpenAI, this release is a meaningful step in their commitment to the open-source ecosystem, in line with their stated mission to make the benefits of AI broadly accessible. The models are designed for agentic workflows, tool use, and complex reasoning tasks, making them ideal for building sophisticated AI applications without the constraints of proprietary systems.

Novita AI is offering the GPT OSS at the following pricing:

gpt-oss-120b: $0.10 input / $0.50 output per million tokens

gpt-oss-20b: $0.05 input / $0.20 output per million tokens

Table Of Contents

Model Overview and Capabilities
Core Features and Technical Specifications
Benchmark Performance
API Access Through Novita AI
Model Architecture & Training
Conclusion

Model Overview and Capabilities

GPT OSS models are mixture-of-experts (MoEs) using a 4-bit quantization scheme (MXFP4). This enables fast inference while keeping resource usage low. Both models support chain-of-thought reasoning with adjustable reasoning effort levels, instruction following, and tool use capabilities.

Model	Layers	Total Params	Active Params Per Token	Total Experts	Active Experts Per Token	Context Length
gpt-oss-120b	36	117B	5.1B	128	4	128k
gpt-oss-20b	24	21B	3.6B	32	4	128k

gpt-oss-120b: High-Performance Reasoning

The gpt-oss-120b model features 117B total parameters with 5.1B active parameters. It achieves near-parity with OpenAI o4-mini on core reasoning benchmarks while running efficiently on optimized infrastructure.

The model outperforms OpenAI o3‑mini and matches or exceeds OpenAI o4-mini on competition coding (Codeforces), general problem solving (MMLU and HLE), and tool calling (TauBench).

Test gpt-oss-120b in playground

gpt-oss-20b: Efficient Edge Reasoning

The gpt-oss-20b model contains 21B total parameters with 3.6B active parameters. It’s designed for efficient deployment scenarios. The 120B model fits on a single H100 GPU, while the 20B model runs within 16GB of memory and is perfect for consumer hardware and on-device applications.

Despite its smaller size, it matches or exceeds OpenAI o3‑mini on standard benchmarks, even outperforming on competition mathematics (AIME 2024 & 2025) and health-related queries (HealthBench).

Test gpt-oss-20b in playground

Core Features and Technical Specifications

Architecture Details

21B and 117B total parameters with 3.6B and 5.1B active parameters, respectively
4-bit quantization scheme using mxfp4 format, applied only on the MoE weights
Token-choice MoE with SwiGLU activations and softmax-after-topk for expert selection
RoPE attention with 128K context length across all attention layers
Alternating attention layers: full-context and sliding 128-token window patterns
Learned attention sink per-head for improved long-context performance

Key Capabilities

Reasoning Models: Text-only models with chain-of-thought and adjustable reasoning effort levels (“low”, “medium”, “high”)

Tool Use Support: Built-in support for web search, Python code execution, and custom tool integration

Structured Outputs: Native support for JSON, XML, and other structured data formats with schema validation

Responses API Compatibility: Full compatibility with OpenAI’s Responses API, the most advanced OpenAI interface for chat models, designed for more flexible and intuitive interactions

Apache 2.0 License: Maximum flexibility for commercial and research use. According to OpenAI, they aim for their tools to be used safely, responsibly, and democratically, while maximizing user control over how they use them. By using gpt-oss, users agree to comply with all applicable law.

Benchmark Performance

Safety Evaluation Results

OpenAI conducted comprehensive safety testing under their Preparedness Framework, including testing an adversarially fine-tuned version of gpt-oss-120b. Their methodology was reviewed by external experts and marks a step forward in setting new safety standards for open-weight models:

Scalable Capability Evaluations: OpenAI confirmed that the default model does not reach their indicative thresholds for High capability in any of the three Tracked Categories (Biological and Chemical capability, Cyber capability, and AI Self-Improvement)
Adversarial Fine-tuning Testing: Even with robust fine-tuning leveraging OpenAI’s field-leading training stack, gpt-oss-120b did not reach High capability in Biological and Chemical Risk or Cyber risk
Frontier Risk Assessment: For most evaluations, the default performance of existing open models comes near to matching the adversarially fine-tuned performance of gpt-oss-120b
External Review: OpenAI’s Safety Advisory Group (SAG) reviewed this testing and concluded the models meet safety standards

API Access Through Novita AI

Novita AI provides comprehensive access to GPT OSS models through both serverless and dedicated endpoints, with full OpenAI API compatibility.

Pricing and Model Details

Model Name: openai/gpt-oss-120b

Input/Output Price (Novita AI):
- Input: $0.10 per million tokens
- Output: $0.50 per million tokens
Context Size: 131,072
Try it now: Test gpt-oss-120b in playground

Model Name: openai/gpt-oss-20b

Input/Output Price (Novita AI):
- Input: $0.05 per million tokens
- Output: $0.20 per million tokens
Context Size: 131,072
Max Output: 32,768
Try it now: Test gpt-oss-20b in playground

Get Started with Novita AI

Use the Playground (No Coding Required)

Instant Access: Sign up and start experimenting with GPT OSS models in seconds
Interactive Interface: Test complex reasoning prompts and visualize chain-of-thought outputs in real-time
Model Comparison: Compare GPT OSS with other leading models for your specific use case

Integrate via API (For Developers) Connect GPT OSS to your applications with Novita AI’s unified REST API.

Option 1: Direct API Integration (Python Example)

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="",
)

model = "openai/gpt-oss-120b"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Key Features:

OpenAI-Compatible API for seamless integration
Flexible parameter control for fine-tuning responses
Streaming support for real-time responses

Option 2: Multi-Agent Workflows with OpenAI Agents SDK Build sophisticated multi-agent systems using GPT OSS:

Plug-and-Play Integration: Use GPT OSS in any OpenAI Agents workflow
Advanced Agent Capabilities: Support for handoffs, routing, and tool integration with superior reasoning performance
Scalable Architecture: Design agents that leverage GPT OSS’s unified reasoning, coding, and agentic capabilities

Connect with Third-Party Platforms

Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor, Trae, and Cline through OpenAI-compatible APIs
Orchestration Frameworks: Connect with LangChain, Dify, CrewAI, Langflow, and other AI orchestration platforms using official connectors
Hugging Face Integration: Novita AI serves as an official inference provider of Hugging Face

Model Architecture & Training

Pre-training and Model Development

The models were trained using a mix of reinforcement learning and techniques informed by OpenAI’s most advanced internal models, including o3 and other frontier systems. They were trained extensively to leverage tool use as part of their reasoning efforts.

Post-training Optimization

Reinforcement Learning from Human Feedback (RLHF): Comprehensive alignment training for helpful, harmless, and honest responses

Safety Training: Extensive safety evaluations and adversarial testing to ensure responsible deployment

Reasoning Calibration: Fine-tuned reasoning effort control allowing optimization for different task complexities

Technical Innovation

Historic Open-Source Return: This marks OpenAI’s first open-weight language model since GPT-2, which was released more than five years ago, representing a meaningful step in their commitment to the open-source ecosystem

Advanced MoE Architecture: Sophisticated mixture-of-experts implementation with token-choice routing and optimized expert selection patterns

Efficient Quantization: Native 4-bit quantization using mxfp4 format enables fast inference while keeping resource usage low, with the 120B model fitting on a single 80GB GPU and the 20B model fitting in 16GB of memory

Conclusion

OpenAI’s GPT OSS models represent a breakthrough in open-source AI, delivering frontier reasoning capabilities under the Apache 2.0 license. Through Novita AI’s API infrastructure, developers can access these powerful models via serverless and dedicated endpoints with full OpenAI compatibility.

Whether building agentic workflows, conducting research, or developing production applications, GPT OSS provides the foundation for next-generation AI solutions. With advanced reasoning, tool use support, and flexible licensing, these models create unprecedented opportunities for AI innovation across industries.

Ready to get started? Experience GPT OSS models instantly at Novita AI’s model playground – no coding required. Sign up today and start building with OpenAI’s most advanced open-source models.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

GPT OSS on Novita AI: Access OpenAI’s Open-Source Models via API