Novita AI is excited to announce that GPT OSS – OpenAI’s groundbreaking open-weight language models – are now available through our inference API. The GPT OSS family comprises two state-of-the-art reasoning models: gpt-oss-120b and gpt-oss-20b. Both are released under the Apache 2.0 license.
According to OpenAI, this release is a meaningful step in their commitment to the open-source ecosystem, in line with their stated mission to make the benefits of AI broadly accessible. The models are designed for agentic workflows, tool use, and complex reasoning tasks, making them ideal for building sophisticated AI applications without the constraints of proprietary systems.
Novita AI is offering the GPT OSS at the following pricing:
gpt-oss-120b: $0.10 input / $0.50 output per million tokens
gpt-oss-20b: $0.05 input / $0.20 output per million tokens
Model Overview and Capabilities
GPT OSS models are mixture-of-experts (MoEs) using a 4-bit quantization scheme (MXFP4). This enables fast inference while keeping resource usage low. Both models support chain-of-thought reasoning with adjustable reasoning effort levels, instruction following, and tool use capabilities.
| Model | Layers | Total Params | Active Params Per Token | Total Experts | Active Experts Per Token | Context Length |
|---|---|---|---|---|---|---|
| gpt-oss-120b | 36 | 117B | 5.1B | 128 | 4 | 128k |
| gpt-oss-20b | 24 | 21B | 3.6B | 32 | 4 | 128k |
gpt-oss-120b: High-Performance Reasoning
The gpt-oss-120b model features 117B total parameters with 5.1B active parameters. It achieves near-parity with OpenAI o4-mini on core reasoning benchmarks while running efficiently on optimized infrastructure.
The model outperforms OpenAI o3‑mini and matches or exceeds OpenAI o4-mini on competition coding (Codeforces), general problem solving (MMLU and HLE), and tool calling (TauBench).
Test gpt-oss-120b in playground
gpt-oss-20b: Efficient Edge Reasoning
The gpt-oss-20b model contains 21B total parameters with 3.6B active parameters. It’s designed for efficient deployment scenarios. The 120B model fits on a single H100 GPU, while the 20B model runs within 16GB of memory and is perfect for consumer hardware and on-device applications.
Despite its smaller size, it matches or exceeds OpenAI o3‑mini on standard benchmarks, even outperforming on competition mathematics (AIME 2024 & 2025) and health-related queries (HealthBench).
Test gpt-oss-20b in playground
Core Features and Technical Specifications
Architecture Details
- 21B and 117B total parameters with 3.6B and 5.1B active parameters, respectively
- 4-bit quantization scheme using mxfp4 format, applied only on the MoE weights
- Token-choice MoE with SwiGLU activations and softmax-after-topk for expert selection
- RoPE attention with 128K context length across all attention layers
- Alternating attention layers: full-context and sliding 128-token window patterns
- Learned attention sink per-head for improved long-context performance
Key Capabilities
Reasoning Models: Text-only models with chain-of-thought and adjustable reasoning effort levels (“low”, “medium”, “high”)
Tool Use Support: Built-in support for web search, Python code execution, and custom tool integration
Structured Outputs: Native support for JSON, XML, and other structured data formats with schema validation
Responses API Compatibility: Full compatibility with OpenAI’s Responses API, the most advanced OpenAI interface for chat models, designed for more flexible and intuitive interactions
Apache 2.0 License: Maximum flexibility for commercial and research use. According to OpenAI, they aim for their tools to be used safely, responsibly, and democratically, while maximizing user control over how they use them. By using gpt-oss, users agree to comply with all applicable law.
Benchmark Performance

Safety Evaluation Results
OpenAI conducted comprehensive safety testing under their Preparedness Framework, including testing an adversarially fine-tuned version of gpt-oss-120b. Their methodology was reviewed by external experts and marks a step forward in setting new safety standards for open-weight models:
- Scalable Capability Evaluations: OpenAI confirmed that the default model does not reach their indicative thresholds for High capability in any of the three Tracked Categories (Biological and Chemical capability, Cyber capability, and AI Self-Improvement)
- Adversarial Fine-tuning Testing: Even with robust fine-tuning leveraging OpenAI’s field-leading training stack, gpt-oss-120b did not reach High capability in Biological and Chemical Risk or Cyber risk
- Frontier Risk Assessment: For most evaluations, the default performance of existing open models comes near to matching the adversarially fine-tuned performance of gpt-oss-120b
- External Review: OpenAI’s Safety Advisory Group (SAG) reviewed this testing and concluded the models meet safety standards
API Access Through Novita AI
Novita AI provides comprehensive access to GPT OSS models through both serverless and dedicated endpoints, with full OpenAI API compatibility.
Pricing and Model Details
Model Name: openai/gpt-oss-120b
- Input/Output Price (Novita AI):
- Input: $0.10 per million tokens
- Output: $0.50 per million tokens
- Context Size: 131,072
- Try it now: Test gpt-oss-120b in playground
Model Name: openai/gpt-oss-20b
- Input/Output Price (Novita AI):
- Input: $0.05 per million tokens
- Output: $0.20 per million tokens
- Context Size: 131,072
- Max Output: 32,768
- Try it now: Test gpt-oss-20b in playground
Get Started with Novita AI
Use the Playground (No Coding Required)
- Instant Access: Sign up and start experimenting with GPT OSS models in seconds
- Interactive Interface: Test complex reasoning prompts and visualize chain-of-thought outputs in real-time
- Model Comparison: Compare GPT OSS with other leading models for your specific use case
Integrate via API (For Developers) Connect GPT OSS to your applications with Novita AI’s unified REST API.
Option 1: Direct API Integration (Python Example)
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="",
)
model = "openai/gpt-oss-120b"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Key Features:
- OpenAI-Compatible API for seamless integration
- Flexible parameter control for fine-tuning responses
- Streaming support for real-time responses
Option 2: Multi-Agent Workflows with OpenAI Agents SDK Build sophisticated multi-agent systems using GPT OSS:
- Plug-and-Play Integration: Use GPT OSS in any OpenAI Agents workflow
- Advanced Agent Capabilities: Support for handoffs, routing, and tool integration with superior reasoning performance
- Scalable Architecture: Design agents that leverage GPT OSS’s unified reasoning, coding, and agentic capabilities
Connect with Third-Party Platforms
- Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor, Trae, and Cline through OpenAI-compatible APIs
- Orchestration Frameworks: Connect with LangChain, Dify, CrewAI, Langflow, and other AI orchestration platforms using official connectors
- Hugging Face Integration: Novita AI serves as an official inference provider of Hugging Face
Model Architecture & Training
Pre-training and Model Development
The models were trained using a mix of reinforcement learning and techniques informed by OpenAI’s most advanced internal models, including o3 and other frontier systems. They were trained extensively to leverage tool use as part of their reasoning efforts.
Post-training Optimization
Reinforcement Learning from Human Feedback (RLHF): Comprehensive alignment training for helpful, harmless, and honest responses
Safety Training: Extensive safety evaluations and adversarial testing to ensure responsible deployment
Reasoning Calibration: Fine-tuned reasoning effort control allowing optimization for different task complexities
Technical Innovation
Historic Open-Source Return: This marks OpenAI’s first open-weight language model since GPT-2, which was released more than five years ago, representing a meaningful step in their commitment to the open-source ecosystem
Advanced MoE Architecture: Sophisticated mixture-of-experts implementation with token-choice routing and optimized expert selection patterns
Efficient Quantization: Native 4-bit quantization using mxfp4 format enables fast inference while keeping resource usage low, with the 120B model fitting on a single 80GB GPU and the 20B model fitting in 16GB of memory
Conclusion
OpenAI’s GPT OSS models represent a breakthrough in open-source AI, delivering frontier reasoning capabilities under the Apache 2.0 license. Through Novita AI’s API infrastructure, developers can access these powerful models via serverless and dedicated endpoints with full OpenAI compatibility.
Whether building agentic workflows, conducting research, or developing production applications, GPT OSS provides the foundation for next-generation AI solutions. With advanced reasoning, tool use support, and flexible licensing, these models create unprecedented opportunities for AI innovation across industries.
Ready to get started? Experience GPT OSS models instantly at Novita AI’s model playground – no coding required. Sign up today and start building with OpenAI’s most advanced open-source models.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





