English Arabic 简体中文 繁體中文 Français Deutsch 日本語 한국어 Português Русский Español
No other translations yet

Step 3.7 Flash API on Novita AI: Multimodal Reasoning, Pricing, and Launch

Step 3.7 Flash API on Novita AI: Multimodal Reasoning, Pricing, and Launch

Step 3.7 Flash is available on Novita AI as a Serverless LLM API for developers who need a multimodal reasoning model that can accept text, image, and video input, call tools, return structured outputs, and work with a 256K context window through the chat completions endpoint. Use it when a workflow needs mixed-media context and a reasoned action plan, not when a small text-only model would already solve the job.

What is Step 3.7 Flash on Novita AI?

Step 3.7 Flash is StepFun’s high-efficiency multimodal reasoning model, hosted on Novita AI for Serverless LLM access. The API model ID is stepfun/step-3.7-flash, and the model is exposed through the chat completions endpoint.

The practical answer for developers is straightforward: use Step 3.7 Flash when your workflow needs more than plain text chat. It is a fit for agentic tasks that combine long instructions, visual or video context, structured output, and tool routing. Examples include analyzing a product walkthrough video, turning screenshots into implementation tasks, planning multi-step operations from mixed media inputs, or using a model to decide when an application function should run.

It is not meant to replace every smaller text model in your stack. If your application only needs short FAQ answers, simple extraction, or high-volume classification, start by comparing current models in the Novita AI model library and Novita AI pricing. Step 3.7 Flash becomes more compelling when multimodal input, long context, or tool-aware planning is part of the actual product requirement.

Step 3.7 Flash specs, availability, and pricing

Novita AI currently lists Step 3.7 Flash as a Serverless LLM model with the following implementation details. Model availability and pricing can change, so check the live model page before production routing.

FieldCurrent Novita AI value
Display nameStep 3.7 Flash
API model IDstepfun/step-3.7-flash
Access pathServerless LLM
Endpointchat/completions
Input modalitiesText, image, video
Output modalityText
Context window262,144 tokens
Max output tokens256,000 tokens
Function callingSupported
Structured outputsSupported
ReasoningSupported
Model familyStepFun
Architecture labelMoE

The current token pricing shown for stepfun/step-3.7-flash is:

Token typeCurrent price
Input tokens$0.20 per million tokens
Cached-read input tokens$0.04 per million tokens
Output tokens$1.15 per million tokens

The same model listing shows request-rate tiers from T1 through T5. The visible T1 quota is 30 RPM and 50,000,000 TPM, with higher RPM values on higher tiers. Treat those as platform limits to verify during account setup, not as a substitute for your own load testing.

Pricing matters because multimodal and long-context requests can grow quickly. A product team should measure prompt size, media-derived context, cached-read reuse, and output length separately. If a workflow repeatedly sends the same system prompt, tool schema, or large instruction block, cached reads can become part of the cost design. If responses regularly approach large output sizes, output tokens will dominate the bill faster than input tokens.

One useful budgeting pattern is to separate evaluation traffic into three buckets. First, measure a plain text baseline for the same task. Second, add image or video input and record how often the extra context changes the answer. Third, test the long-context version with the full policy, schema, or product documentation attached. If the third bucket improves routing accuracy or reduces manual review, the larger request can be justified. If it does not, keep the production path narrower.

What multimodal reasoning work does it fit?

Step 3.7 Flash is most interesting when the model has to reason across different kinds of input and then produce a plan, decision, or structured answer.

For product and support teams, that can mean asking the model to inspect a UI screenshot or short video clip, identify the user’s likely issue, and return a JSON object that routes the ticket to the right queue. For developer tools, it can mean reading a screen recording of a bug, the related error text, and a source snippet, then producing a reproduction checklist. For operations workflows, it can mean combining long policy text with visual evidence and asking the model to produce a step-by-step handling plan.

The important distinction is that Step 3.7 Flash should receive the evidence needed for the task. Do not ask it to infer details that were never supplied. If the workflow depends on a database lookup, billing state, order status, or deployment record, expose that data through your application layer or a tool call instead of relying on the model’s general knowledge.

Good evaluation prompts include:

  • A support triage prompt with one screenshot, the user’s description, and a required JSON schema.
  • A product QA prompt with a short video input and a bug report template.
  • A tool-routing prompt where the model must choose between create_ticket, search_docs, and escalate_to_human.
  • A long-context analysis prompt where the same tool schema and policy text can benefit from cached reads.

Avoid starting with vague prompts such as “analyze this video” or “reason about this image.” Give the model the job, the decision boundary, and the output format. That makes it easier to compare results across models and easier to measure whether the extra context and multimodal input are paying for themselves.

For agent workflows, the model’s tool support is the part to test most carefully. A good tool-calling evaluation should include cases where the correct answer is to call a tool, cases where the correct answer is to ask for more information, and cases where no tool should run. That prevents the evaluation from rewarding over-eager actions just because the model can emit a function call.

How should teams evaluate it before production?

Start with a small test set that resembles your product, not a generic benchmark prompt. Include successful cases, edge cases, and prompts that should not trigger a tool call. If your application needs structured output, validate the output against your schema instead of checking it manually.

A minimal OpenAI-compatible text request uses the Novita AI base URL and the verified model ID:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response = client.chat.completions.create(
    model="stepfun/step-3.7-flash",
    messages=[
        {
            "role": "system",
            "content": "You are a practical incident triage assistant. Return concise, structured recommendations.",
        },
        {
            "role": "user",
            "content": "Review this incident summary and identify the next three checks: API latency doubled after a deploy, database CPU is normal, error rate is flat.",
        },
    ],
    max_tokens=700,
    temperature=0.2,
)

print(response.choices[0].message.content)

For production evaluation, add four checks before routing real user traffic:

  • Cost check: log input, cached-read, and output tokens for representative requests.
  • Schema check: validate structured outputs automatically and retry or fall back when responses do not match.
  • Tool check: test both tool-call and no-tool-call cases, including ambiguous prompts.
  • Media check: evaluate the actual image or video formats your app sends, not only text summaries of media.

Function calling and structured outputs are useful, but they do not remove application responsibility. Your service still needs authorization checks, input validation, idempotent tool execution, and audit logs for actions that change user data.

For multimodal requests, keep the media handling path explicit. Store or reference the asset according to your application’s privacy rules, preserve enough metadata to debug failures, and record which request format was used. If a production issue appears later, you will want to know whether the model saw the original image or video, a compressed version, a frame sample, or a text summary generated by another service.

How does Step 3.7 Flash compare with separate quick-start work?

This article is the launch and source-of-truth overview: availability, model ID, pricing, multimodal scope, and developer fit. A separate Step 3.7 Flash quick-start article can go deeper on request payloads, image and video inputs, function calling examples, and structured-output patterns.

That split is useful because launch readers usually need to answer, “Should we evaluate this model?” Quick-start readers need to answer, “What exact request should I send?” Keeping those jobs separate avoids burying pricing and capability facts inside a long tutorial, while still leaving room for implementation detail where it belongs.

For now, the best next step is to open the Step 3.7 Flash model page, confirm the current rate card and limits for your account, and run a narrow evaluation prompt that uses the same media, tool schema, or structured output your application will need.

FAQ

Is Step 3.7 Flash available on Novita AI?

Yes. Novita AI currently lists Step 3.7 Flash as a Serverless LLM model with the API model ID stepfun/step-3.7-flash.

What inputs does Step 3.7 Flash support?

The Novita AI model page currently lists text, image, and video as supported input modalities. The output modality is text.

How much does Step 3.7 Flash cost on Novita AI?

Current Novita AI pricing for stepfun/step-3.7-flash is $0.20 per million input tokens, $0.04 per million cached-read input tokens, and $1.15 per million output tokens.

Does Step 3.7 Flash support function calling?

Yes. The Novita AI model page currently lists function calling, structured outputs, and reasoning support for Step 3.7 Flash.

What endpoint should developers use?

Use Novita AI’s OpenAI-compatible chat completions endpoint with the model ID stepfun/step-3.7-flash. The base URL for OpenAI-compatible SDK usage is https://api.novita.ai/openai.