DeepSeek-V3.2-Exp on Novita AI: Half the Price with Sparse Attention for Long-Context AI

DeepSeek-V3.2-Exp

DeepSeek has just launched DeepSeek-V3.2-Exp, an experimental model that solves one of AI’s biggest challenges: processing long documents efficiently and affordably.

Built on DeepSeek-V3.1-Terminus, this new model introduces DeepSeek Sparse Attention (DSA)—a breakthrough technology that delivers half the price and significant speedup for long-context scenarios.

At Novita AI, we’re bringing this cutting-edge model to developers through our easy-to-use API platform. Whether you’re building document analysis tools, code assistants, or chatbots that need to remember entire conversations, DeepSeek-V3.2-Exp delivers the efficiency and cost savings you need without sacrificing quality.

What Makes DeepSeek-V3.2-Exp Special?

DeepSeek-V3.2-Exp is an experimental AI model designed to handle long documents and conversations more efficiently than traditional models.

The “Exp” stands for experimental—DeepSeek is testing a new approach to see how well it performs in real-world applications.

The Problem It Solves

Traditional AI models slow down dramatically when processing long texts.

Reading a 100-page document or maintaining a lengthy conversation becomes expensive and time-consuming. This happens because standard models need to process every single word in relation to every other word—the longer the text, the more calculations required.

The Solution: Sparse Attention

DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention (DSA), which works like a smart filter.

Instead of analyzing every word against every other word, the model identifies and focuses only on the most relevant parts. Think of it like speed-reading: you don’t read every word with equal attention—you focus on what matters most.

Key Features

  • Context Length: Handles up to 128,000 tokens (approximately 96,000 words or 300+ pages)
  • Half the Price: 50% lower cost compared to DeepSeek-V3.1-Terminus for long-context processing
  • Significant Speedup: Dramatic efficiency improvements in both training and inference, especially in long-context scenarios
  • Architecture: Built on DeepSeek-V3.1-Terminus with the addition of DeepSeek Sparse Attention
  • Same Quality: Maintains performance comparable to DeepSeek-V3.1-Terminus

The model builds on the proven DeepSeek-V3.1-Terminus foundation, which already supported 128K context length, but adds this intelligent efficiency layer through continued training.

Cost Efficiency Breakthrough

DeepSeek Sparse Attention (DSA) reduces the core attention complexity from O(L²) to O(Lk), where k is the number of selected tokens (much smaller than L).

Although the lightning indexer still has O(L²) complexity, it requires much less computation compared with the main attention mechanism. Combined with optimized implementation, DSA achieves a significant end-to-end speedup in long-context scenarios.

DeepSeek benchmarked DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp on actual service deployed on H800 GPUs at a rental price of 2 USD per GPU hour.

The results demonstrate dramatic efficiency improvements, especially as context length increases.

Inference costs of DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp

Learn more about the architecture and implementation details in the official technical documentation.

Performance: Does It Actually Work?

DeepSeek evaluated the model on a suite of benchmarks focusing on diverse capabilities.

Overall, DeepSeek-V3.2-Exp does not show substantial performance degradation compared with DeepSeek-V3.1-Terminus.

General Knowledge

BenchmarkDeepSeek-V3.1-TerminusDeepSeek-V3.2-Exp
MMLU-Pro85.085.0
GPQA-Diamond80.779.9
Humanity’s Last Exam21.719.8

Note: The performance on GPQA, HLE, and HMMT 2025 is lower because DeepSeek-V3.2-Exp generates fewer reasoning tokens. Intermediate checkpoints that produce comparable token counts show the performance gap closes.

Web Search and Agents

BenchmarkDeepSeek-V3.1-TerminusDeepSeek-V3.2-Exp
BrowseComp38.540.1
BrowseComp_zh45.047.9
SimpleQA96.897.1

Interestingly, the model actually improves on search tasks! This suggests sparse attention may help the model focus on relevant information when retrieving answers from long contexts.

Code Generation

BenchmarkDeepSeek-V3.1-TerminusDeepSeek-V3.2-Exp
LiveCodeBench (2408-2505)74.974.1
Codeforces-Div1 Rating20462121
Aider-Polyglot76.174.5

The model shows strong coding ability, even achieving a higher competitive programming rating (2121 is expert-level on Codeforces).

Code Agents

BenchmarkDeepSeek-V3.1-TerminusDeepSeek-V3.2-Exp
SWE Verified (Agent mode)68.467.8
SWE-bench Multilingual (Agent mode)57.857.9
Terminal-bench (Terminus 1 framework)36.737.7

The model maintains strong agent capabilities for solving real-world software engineering tasks.

Mathematics

BenchmarkDeepSeek-V3.1-TerminusDeepSeek-V3.2-Exp
AIME 202588.489.3
HMMT 202586.183.6

The model performs exceptionally well on AIME 2025 (a challenging high school math competition), solving 89.3% of problems.

Training Stability

DeepSeek compared the reinforcement learning training curves of both models on BrowseComp and SWE Verified.

The performance of both models improved steadily throughout the training process with closely aligned curves, reflecting the training stability of DSA.

Getting Started on Novita AI

Accessing DeepSeek-V3.2-Exp through Novita AI offers multiple pathways tailored to different technical expertise levels and use cases.

Whether you’re a business user exploring AI capabilities or a developer building production applications, Novita AI provides the tools you need.

Use the Playground (No Coding Required)

  • Instant Access: Sign up and start experimenting with DeepSeek-V3.2-Exp in seconds
  • Interactive Interface: Test prompts and visualize outputs in real-time
  • Model Comparison: Compare DeepSeek-V3.2-Exp with other leading models for your specific use case

The playground enables you to test various prompts and see immediate results without any technical setup.

Perfect for prototyping, testing ideas, and understanding model capabilities before full implementation.

Integrate via API (For Developers)

Connect DeepSeek-V3.2-Exp to your applications with Novita AI’s unified REST API.

Option 1: Direct API Integration

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key="session_lnrv9fuPcmgAz_fk3YmwpmOhfIpYY11iFpvaauxsvknzSam5bSQasB-eIUbv9o2PGSF_tpNcC44ez9wAxUyuDA==",
)

model = "deepseek/deepseek-v3.2-exp"
stream = True # or False
max_tokens = 81920
system_content = "Be a helpful assistant"
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build sophisticated multi-agent systems leveraging DeepSeek-V3.2-Exp’s capabilities:

  • Plug-and-Play Integration: Use DeepSeek-V3.2-Exp in any OpenAI Agents workflow
  • Advanced Agent Capabilities: Support for handoffs, routing, and tool integration
  • Scalable Architecture: Design agents that leverage DeepSeek-V3.2-Exp’s efficient long-context processing

Connect with Third-Party Platforms

Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor, Codex, Claude Code, Trae, Qwen Code, and Cline through OpenAI-compatible APIs and Anthropic-compatible APIs.

Orchestration Frameworks: Connect with LangChain, Dify, CrewAI, Langflow, and other AI orchestration platforms using official connectors.

Hugging Face Integration: Novita AI serves as an official inference provider of Hugging Face, ensuring broad ecosystem compatibility.

Conclusion

DeepSeek-V3.2-Exp represents a significant advancement in efficient and affordable long-context AI processing.

Through DeepSeek Sparse Attention, the model achieves half the price of DeepSeek-V3.1-Terminus with significant speedup and substantial efficiency improvements in both training and inference, especially in long-context scenarios, while maintaining performance comparable to DeepSeek-V3.1-Terminus.

DeepSeek is actively pursuing further large-scale testing in real-world scenarios to uncover potential limitations of the sparse attention architecture.

Novita AI makes it simple to access this experimental technology through our developer-friendly API platform—no infrastructure complexity, just powerful AI at your fingertips with 50% cost savings.

Ready to experience the future of efficient and affordable long-context AI? Start exploring DeepSeek-V3.2-Exp in the Playground today.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading