Kimi K2.7 Code is available on Novita AI with the model ID moonshotai/kimi-k2.7-code, an OpenAI-compatible chat/completions endpoint, a 262,144-token context window, and support for text, image, and video inputs. This quick start covers the developer setup: authenticate, send your first request, use vision input, add function calling, and understand pricing before building. For a broader look at positioning and use cases, see the Kimi K2.7 Code on Novita AI overview.
Kimi K2.7 Code API Setup
Start with three pieces of configuration:
| Item | Value |
|---|---|
| API key | Create and store a Novita AI API key in an environment variable such as NOVITA_API_KEY. |
| OpenAI-compatible base URL | https://api.novita.ai/openai |
| Chat completions endpoint | POST https://api.novita.ai/openai/v1/chat/completions |
| Model ID | moonshotai/kimi-k2.7-code |
The Novita AI documentation index lists the OpenAI-compatible base URL, and the chat completions API reference documents the full request and response fields.
Keep the API key out of source control. Export it in your shell for local development:
export NOVITA_API_KEY="your_api_key"
If your application already uses the OpenAI SDK, the change is minimal: point the base URL at Novita AI and set the model to moonshotai/kimi-k2.7-code.
Kimi K2.7 Code Pricing and Limits
Use the exact model ID in code. In user-facing UI, use the display name “Kimi K2.7 Code”.
| Field | Current Novita value |
|---|---|
| Display name | Kimi K2.7 Code |
| API model ID | moonshotai/kimi-k2.7-code |
| Model series | MoonshotAI |
| Architecture | MoE, 1T parameters total, 32B activated |
| Endpoint families | chat/completions, anthropic |
| Input modalities | Text, image, video |
| Output modality | Text |
| Context window | 262,144 tokens |
| Max output tokens | 262,144 tokens |
| Features | Function calling, structured outputs, reasoning |
As of June 16, 2026, Novita lists these token prices for moonshotai/kimi-k2.7-code:
| Token type | Listed price |
|---|---|
| Input tokens | $0.95 per 1M tokens |
| Output tokens | $4.00 per 1M tokens |
| Cache read input tokens | $0.19 per 1M tokens |
Pricing, availability, and rate limits can change. Check the Kimi K2.7 Code model page and the Novita AI pricing page before production launch or any cost commitment.
Kimi K2.7 Code cURL Example
Start with a text-only request to confirm authentication, model routing, and response parsing before adding vision or tool calls.
curl "https://api.novita.ai/openai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${NOVITA_API_KEY}" \
-d '{
"model": "moonshotai/kimi-k2.7-code",
"messages": [
{
"role": "system",
"content": "You are a concise code review assistant."
},
{
"role": "user",
"content": "List three common mistakes when implementing retry logic in Python."
}
],
"max_tokens": 512,
"temperature": 0.2
}'
A successful response returns the standard chat completions shape: a choices array, a message with content, model/created metadata, and a usage object with prompt, completion, and total token counts.
Use this smoke test to verify:
- The API key is valid and the authorization header is correctly formatted.
- The model ID is accepted without a 404 or model-not-found error.
- Your client can parse
choices[0].message.content. - Token usage is logged so you can monitor cost from the first request.
Kimi K2.7 Code Python Example
The OpenAI Python SDK works with Novita AI when you set the Novita base URL. Pin the SDK version according to your own dependency policy.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/openai",
api_key=os.environ["NOVITA_API_KEY"],
)
response = client.chat.completions.create(
model="moonshotai/kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are a concise code review assistant."},
{
"role": "user",
"content": "Review this Python function for off-by-one errors and missing edge cases:\n\ndef get_items(lst, start, end):\n return lst[start:end]",
},
],
max_tokens=512,
temperature=0.2,
)
print(response.choices[0].message.content)
print("Tokens used:", response.usage.total_tokens)
For long coding-agent sessions, set max_tokens explicitly. Kimi K2.7 Code supports up to 262,144 output tokens, but production agents should budget token usage per turn and monitor cumulative cost across multi-step runs.
Image and Video Input
Novita lists text, image, and video as input modalities for Kimi K2.7 Code. For vision input, pass a content array in the user message with a text part and an image_url part:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/openai",
api_key=os.environ["NOVITA_API_KEY"],
)
response = client.chat.completions.create(
model="moonshotai/kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are a UI code review assistant."},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe any accessibility issues visible in this UI screenshot and suggest CSS fixes.",
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/screenshot.png"},
},
],
},
],
max_tokens=512,
)
print(response.choices[0].message.content)
A practical order for multimodal integration:
- Confirm the text-only smoke test works first.
- Add one image input with a clearly verifiable task, such as extracting labels from a UI screenshot.
- Validate both the response quality and the response shape for your real workload.
- Test video inputs separately — start with short clips, verify request format, and measure latency and token costs before adding video to a production path.
Do not assume every OpenAI-compatible multimodal payload is accepted identically by every Novita-hosted model. Verify the exact image and video payload shape in the current Novita AI documentation or console examples for moonshotai/kimi-k2.7-code before shipping.
Function Calling and Structured Outputs
Kimi K2.7 Code supports function calling through the tools parameter and structured outputs through response_format. Both are listed as features on the Novita AI model page.
Use function calling when the model should select a tool and return structured arguments instead of answering in prose:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/openai",
api_key=os.environ["NOVITA_API_KEY"],
)
tools = [
{
"type": "function",
"function": {
"name": "file_search",
"description": "Search the repository for files matching a pattern.",
"parameters": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Glob pattern to match, e.g. '**/*.py'",
},
"directory": {
"type": "string",
"description": "Root directory to search within.",
},
},
"required": ["pattern"],
},
},
}
]
response = client.chat.completions.create(
model="moonshotai/kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are a repository analysis assistant."},
{
"role": "user",
"content": "Find all Python files in the src directory that might contain database migration logic.",
},
],
tools=tools,
tool_choice="auto",
temperature=0.1,
)
message = response.choices[0].message
if message.tool_calls:
for call in message.tool_calls:
print(f"Tool: {call.function.name}")
print(f"Arguments: {call.function.arguments}")
else:
print(message.content)
For structured outputs, use response_format with json_schema when you need a validated JSON response without a tool call. Keep early schemas small and test your parser against the exact response shape moonshotai/kimi-k2.7-code returns before relying on strict mode in production.
Kimi K2.7 Code’s interleaved thinking architecture means it reasons through multi-step tasks before returning a result. For agentic coding workflows with multiple tool calls per turn, test how tool choice, argument quality, and response latency behave on your actual task set before routing production traffic.
Production Testing Checklist
Kimi K2.7 Code is priced separately for input, output, and cache read tokens. Cost profiles vary significantly by workload:
- Long-context code review: large input token counts dominate cost.
- Code generation agents: output token usage scales with response length and the number of turns.
- Repeated-context workflows: cache-read pricing applies when a stable system prompt, tool schema, or repository summary recurs across many calls.
Before production, run an evaluation set that includes:
- Short text-only prompts (latency baseline and authentication check).
- Long-context prompts near your expected working size, not the maximum window.
- Tool-call prompts where the correct behavior is to call a function with valid arguments.
- Image inputs that match your real upload source and file handling.
- Failure cases: oversized input, missing media URL, invalid API key, and timeout behavior.
Feature lists describe what is available. Evaluation on your real workload tells you whether latency, token usage, tool argument quality, and output correctness meet your production bar.
FAQ
Is Kimi K2.7 Code available through Novita AI?
Yes. Novita AI lists Kimi K2.7 Code as a Serverless LLM with the API model ID moonshotai/kimi-k2.7-code.
What is the correct model ID?
Use moonshotai/kimi-k2.7-code in all API calls.
What endpoint should I use?
Use the OpenAI-compatible chat completions endpoint: POST https://api.novita.ai/openai/v1/chat/completions. Set the base URL to https://api.novita.ai/openai when using an OpenAI SDK client.
How much does Kimi K2.7 Code cost?
As of June 16, 2026, Novita AI lists $0.95 per 1M input tokens, $4.00 per 1M output tokens, and $0.19 per 1M cache read input tokens. Verify current prices at the Kimi K2.7 Code model page before any procurement decision.
Does it support image and video input?
Novita lists text, image, and video as input modalities. For the exact payload shape, verify with current Novita documentation or a test call before shipping multimodal features.
Does Kimi K2.7 Code support function calling?
Yes. Use the tools parameter in the chat completions request. Novita lists function calling and structured outputs as supported features.
What is the context window?
262,144 tokens context window and 262,144 tokens maximum output, as listed on the Novita AI model page.
