B200 on Novita AI: Only $4.77/hr for Running DeepSeek R1!

B200 on Novita AI

The NVIDIA B200 is here—bare metal, blazing fast, and now available at Novita AI.

Experience over 30,000 tokens per second on DeepSeek-R1 with the NVIDIA B200—powered by next-gen Blackwell architecture and built for LLMs, generative AI, and enterprise-scale inference. No hardware? No problem. Novita AI gives you on-demand access at a fraction of the cost.

What is B200?

The NVIDIA B200 is a next-generation GPU built on the Blackwell architecture, delivering breakthrough AI performance with up to 72 petaFLOPS for training and 144 petaFLOPS for inference, making it ideal for large language models and generative AI workloads.

b200

Key Performance Metrics

ComponentDetails
GPU8x NVIDIA Blackwell GPUs
GPU Memory1,440GB total, 64TB/s HBM3e bandwidth
Performance72 petaFLOPS FP8 training and 144 petaFLOPS FP4 inference
NVIDIA NVSwitch2x
NVIDIA NVLink Bandwidth14.4 TB/s aggregate bandwidth
System Power Usage~14.3kW max
CPU2x Intel Xeon Platinum 8570 Processors, 112 Cores total, 2.1 GHz (Base), 4 GHz (Max Boost)
System Memory2TB, configurable to 4TB
Networking4x OSFP ports (8x single-port NVIDIA ConnectX-7 VPI), Up to 400Gb/s Infiniband/Ethernet; 2x dual-port QSFP112 BlueField-3 DPU, Up to 400Gb/s Infiniband/Ethernet
Management Network10Gb/s onboard NIC with RJ45; 100Gb/s dual-port Ethernet NIC; BMC with RJ45
StorageOS: 2x 1.9TB NVMe M.2, Internal: 8x 3.84TB NVMe U.2
SoftwareNVIDIA AI Enterprise, NVIDIA Mission Control, NVIDIA Runai Technology, NVIDIA DGX OS / Ubuntu
Rack Units (RU)10 RU
System DimensionsHeight: 17.5in (444mm), Width: 19.0in (482.2mm), Length: 35.3in (897.1mm)
Operating Temperature5–30°C (41–86°F)
Enterprise Support3-year standard support for hardware/software; 24/7 support portal access; live agent during business hours

This is essentially an AI supercomputer that can:

  • Train advanced AI models (like ChatGPT, Claude)
  • Serve thousands of AI inference requests simultaneously
  • Process massive datasets for research or business intelligence
  • Power AI applications for entire organizations

Cost Efficiency of the B200

b200 cost
b200 cost

This price isn’t for a single GPU. It typically covers a full DGX system with:

  • 8x Blackwell B200 GPUs
  • Dual Xeon CPUs
  • High-speed NVLink & NVSwitch interconnects
  • 1.44TB HBM3e GPU memory
  • 2TB+ system RAM
  • Enterprise-grade networking, SSDs, and software stack (NVIDIA AI Enterprise, Run:ai, etc.)

Application of B200

1. AI Training and Inference

The B200 excels in both training and inference of large-scale AI models, particularly large language models (LLMs) and generative AI applications.

  • Training: With its second-generation Transformer Engine and support for FP4 precision, the B200 accelerates the training process of massive models, reducing time and computational resources required.
  • Inference: The B200 delivers exceptional inference performance, achieving over 1,000 tokens per second per user in benchmarks with models like Meta’s Llama 4 Maverick.

2. Graphics and Visualization

While primarily designed for AI workloads, the B200 also supports advanced graphics and visualization tasks:

  • Ray Tracing: Equipped with 4th-generation RT Cores, the B200 can perform real-time ray tracing, enabling high-fidelity rendering for complex scenes.
  • Shader Technology: The GPU’s architecture includes enhancements like Shader Execution Reordering (SER) 2.0, optimizing the execution of complex shader programs and compute kernels.

3. Precision Workloads

The B200 is adept at handling precision-intensive tasks across various scientific and industrial domains:

  • The NVIDIA B200 GPU, based on the Blackwell architecture, does support floating-point (FP) operations through its fifth-generation Tensor Cores. These Tensor Cores are designed to handle a variety of FP precisions, including FP4, FP6, FP8, FP16, and TF32, enabling efficient acceleration of AI workloads.

B200+Deepseek R1=World Record 

GPU SetupTotal ThroughputNotes
8x NVIDIA B200 (DGX B200)Over 30,000 tokens/sAchieved using TensorRT-LLM with FP4 precision
8x NVIDIA H200 (HGX H200)Up to 3,872 tokens/sUtilized NVIDIA NIM microservice
8x NVIDIA H100 (Quantized 4-bit)Approximately 2,500 tokens/sDeployed with vLLM 0.7.3
4x NVIDIA H10025 tokens/sReal-world test on Lambda Cloud
Single NVIDIA RTX 4090 (24GB)3–33 tokens/sPerformance varies based on quantization and configuration

In summary, the NVIDIA B200 GPU stands out as the top performer for DeepSeek-R1 inference, offering unparalleled throughput that enables efficient deployment of large language models in enterprise environments.

How to Run B200 at a very Cost-effectively Price?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing a affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the GPU Bare Metal

Log in to your account and click on the GPU Bare Metal button.

Log In and Access the GPU Bare Metal

Step2: Chooose Your GPU

Step2: Chooose Your GPU

Select the Device

  • Device Name: Choose H100 SXM or B200 SXM.
  • Region: United States of America.
  • Configuration (for H100 SXM):
    • 8 GPUs
    • 2048 GB Memory
    • 104vCPU/Node
    • 15.36 TB Storage
    • at $1.7./hour.
  • Configuration (for B200 SXM):
    • 8 GPUs
    • 2304 GB Memory
    • 144vCPU/Node
    • 30.8 TB Storage
    • at $4.77./hour.

Set the Quantity and Rental Duration

  • Adjust the GPU Quantity field to match your needs. For example, select 8 GPUs.
  • Choose the rental duration. For instance, set it to 1 month.

The NVIDIA B200 GPU, built on the Blackwell architecture, is a cutting-edge AI supercomputer designed for extreme performance across training, inference, graphics, and precision workloads. With world-record throughput for DeepSeek-R1 and an optimized FP4/FP8 Transformer Engine, it delivers unmatched efficiency for large-scale generative AI. While the initial system cost is high, platforms like Novita AI offer flexible, cost-effective cloud access to B200 hardware—making top-tier AI accessible without infrastructure overhead.

Frequently Asked Questions

What is the NVIDIA B200?

The B200 is NVIDIA’s latest GPU based on the Blackwell architecture, offering 72 petaFLOPS (FP8) for training and 144 petaFLOPS (FP4) for inference—ideal for LLMs, generative AI, and scientific computing.

What models run best on B200?

The B200 excels with large models like DeepSeek-R1, Llama 4 Maverick, and other multi-billion parameter LLMs, thanks to its high throughput and memory.

What’s the DeepSeek-R1 inference performance on B200?

With 8x B200 GPUs, DeepSeek-R1 achieves over 30,000 tokens per second, far exceeding H100 and H200 setups.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading