- Why Sandbox Migration Questions Matter
- API and SDK Compatibility
- Code Interpreter Compatibility
- Session Lifecycle and Timeout Behavior
- File and State Persistence
- Package Install Policies
- Network Policies and Egress
- Secrets Handling in Sandboxes
- Concurrency and Scaling Limits
- Observability and Logging
- Migration Path and Effort
- Pricing Model Differences
- Novita Agent Sandbox Fit
- FAQ
When evaluating an E2B-compatible or E2B-alternative sandbox for an AI application, check API surface overlap, SDK interface compatibility, code interpreter session behavior, file and state lifecycle, package install policies, network egress controls, session duration and concurrency limits, and pricing model before committing to a migration. None of these checks takes more than an afternoon of testing — but skipping any one of them is the most common source of post-migration surprises in production.
Why Sandbox Migration Questions Matter
Sandbox providers look similar at the surface level. They all offer isolated execution, some form of Python support, and a REST or SDK interface. But the details diverge quickly once you try to run a real agent workload: a coding agent that needs a persistent filesystem across tool calls, a data analysis workflow that installs pandas at runtime, or a browser agent that needs outbound HTTPS to a specific API.
A useful migration checklist is not a feature matrix. It is a set of questions you run against your actual application’s requirements before deciding whether a provider swap is low-friction or a full re-architecture.
This guide walks through each category with the questions worth asking, what to look for in provider documentation, and how Novita Agent Sandbox addresses each dimension for teams evaluating it as a migration target.
API and SDK Compatibility
Questions to ask:
- Does the target provider offer an official SDK for your language (Python, TypeScript, Go)?
- Does the SDK expose the same high-level primitives you depend on: sandbox creation, code execution, file operations, process management?
- Is the REST API surface stable and versioned, or is it under rapid change?
- What authentication flow does the SDK use (API key header, OAuth, service account)? Does that match your existing secrets management?
What to look for: SDK documentation that covers sandbox lifecycle methods, file system methods, and process methods explicitly. A provider with only a generic REST API and no maintained SDK will require more glue code on your end.
Where differences surface in practice: E2B’s Python SDK wraps sandbox creation, code execution with sandbox.run_code(), and filesystem operations. If your application calls specific method names or relies on streaming output behavior from the SDK, test those paths on the target provider before assuming they work.
Code Interpreter Compatibility
Questions to ask:
- Does the sandbox support interactive Python execution (REPL-style, not just script execution)?
- How is standard output, standard error, and execution result separated?
- Can the sandbox produce charts, figures, or binary output (PNG, SVG, HTML) from Python code?
- What Python version is the default, and is it configurable?
- Can the code interpreter run arbitrary shell commands, or is execution restricted to Python?
What to look for: Many AI application frameworks assume a streaming or structured execution result that separates stdout, stderr, display data (Jupyter-style rich output), and execution errors. If your agent parses that structure, a provider that returns only a flat text response will require you to rewrite the parsing layer.
Streaming execution results: Some providers stream partial output as the code runs. Others return a single response object after execution completes. For short code snippets this rarely matters, but for long-running data processing steps, streaming partial output is often important for user experience.
Session Lifecycle and Timeout Behavior
Questions to ask:
- What is the default and maximum session timeout?
- Does the provider support pausing and resuming a session (state preserved across interruptions)?
- What happens to in-progress execution if a session times out?
- Is session creation synchronous or asynchronous from your application’s perspective?
- How do you explicitly terminate a session, and what cleanup happens automatically?
What to look for: Coding agents and multi-step data analysis workflows often need sessions that outlast a single LLM turn. A provider with a 60-second default timeout and no pause/resume support forces your agent to serialize all state before each turn ends — a significant architecture constraint.
On pause/resume specifically: Pause/resume is different from creating a new session with a snapshot. Pause/resume preserves in-memory process state, open file handles, and loaded libraries. Snapshot-and-restore restores a filesystem image but typically does not preserve running processes. Know which mechanism a provider offers and which your agent actually needs.
Session creation latency: If your agent creates a new sandbox per tool call, startup latency compounds quickly. Check provider documentation for cold-start versus warm-pool behavior and whether you can pre-warm sessions.
File and State Persistence
Questions to ask:
- Does the sandbox have a persistent filesystem across code execution steps within a session?
- Can files created in one session be accessed in a subsequent session, or is the filesystem ephemeral per session?
- Is there a file upload/download API, or must files be passed inline?
- What are the filesystem size limits (disk quota per session)?
- If your agent generates large artifacts (models, datasets), how does file export work?
What to look for: Most sandboxes offer a per-session ephemeral filesystem. If your workflow requires cross-session persistence (for example, a coding agent that builds an artifact over multiple user interactions), check whether the provider supports named volumes, persistent workspaces, or a documented export-and-restore pattern.
File I/O in code interpreter mode: For data analysis agents, the ability to write a CSV or PNG inside the sandbox and then download it to your application is a core primitive. Test that the round-trip works end-to-end: upload a file, run code that reads and transforms it, download the result.
Package Install Policies
Questions to ask:
- Can sandboxed code run
pip installat runtime without restrictions? - Does the provider allow custom base images or pre-installed package environments?
- Is there a mechanism to allowlist or denylist packages?
- Does package installation survive across tool calls within a session, or is it per-execution?
- What happens when a package install fails (missing system deps, version conflict)?
What to look for: Runtime package installation is one of the most common sandbox divergence points. Some providers install packages into a persistent session layer, so pip install pandas in step 1 is available in step 2. Others reset to a base image on each code block. If your agent assumes installation persists, this is a breaking assumption.
Supply chain note: Allowing arbitrary pip install at runtime has supply chain implications. For production workloads, ask whether the provider allows internet-restricted installs (from a private PyPI mirror or a curated allowlist) rather than open pip install from the public internet.
Network Policies and Egress
Questions to ask:
- Is outbound internet access enabled by default, or is the sandbox network-isolated?
- Can sandboxed code make HTTP requests to external APIs?
- Is there a configurable allowlist or denylist for egress destinations?
- What happens to DNS resolution inside the sandbox?
- Can two sandboxes communicate directly, or only through your application layer?
What to look for: For a data analysis agent that fetches public datasets, open egress is convenient. For a coding agent running inside a security-sensitive environment, controlled or blocked egress is the right default. Know which your workload needs.
Browser agents vs. code execution agents: Browser agents typically need full internet access (to navigate to URLs the user specifies). Code execution agents often only need access to specific APIs. These are different egress profiles that may require different sandbox configurations.
Secrets Handling in Sandboxes
Questions to ask:
- How do you inject secrets (API keys, credentials) into a sandbox at creation time?
- Are injected secrets accessible as environment variables, mounted files, or both?
- Are secrets visible in execution logs or serialized state?
- Does the provider scrub secrets from log output automatically?
What to look for: The most common mistake is injecting a secret via an environment variable and then having the sandbox log all environment variables on startup, leaking the secret to your observability stack. Ask whether the provider has any scrubbing behavior, and build application-level scrubbing if not.
Difference from general env vars: Not all environment variables are secrets. Providers that treat the two interchangeably (no typed secrets, no redaction) require more defensive coding on your side.
Concurrency and Scaling Limits
Questions to ask:
- What is the default and maximum concurrent sandbox limit per account?
- Is concurrency enforcement hard (requests fail above the limit) or soft (requests queue)?
- Are there per-region or per-datacenter concurrency caps?
- Is there a sandbox-per-user isolation model, or do all sandboxes share account-level limits?
- What is the burst behavior when you spike from 0 to 100 concurrent sandboxes?
What to look for: Evaluation workloads, RL environments, and multi-tenant coding platforms all require high concurrency. A provider with a free tier capped at 5 or 10 concurrent sandboxes is viable for prototyping but not for production RL runs with 50–100 parallel trials.
Account vs. organizational limits: Some providers enforce limits per API key and allow multiple keys per organization. Others enforce limits at the organization level regardless of key count. For high-concurrency workloads, this distinction affects how you structure your production account.
Observability and Logging
Questions to ask:
- What execution logs does the provider expose: stdout, stderr, system events, network traffic?
- Are logs streamed in real time or available only after execution completes?
- How long are logs retained?
- Is there a structured log API (JSON, searchable fields) or only plaintext?
- Can you attach your own observability stack (OpenTelemetry, Datadog, Splunk)?
What to look for: For debugging agent failures in production, you want to know what code ran, what it printed, what files it created, and what network calls it made. Providers that expose only stdout/stderr and nothing else make root-cause analysis slow.
Audit trail requirements: If your use case involves regulated data or compliance requirements, ask whether the provider can produce an audit log of all execution events with timestamps. Plaintext stdout is not an audit trail.
Migration Path and Effort
Before committing to a migration, scope the actual work across these dimensions:
SDK layer: If the target provider has an official SDK with similar method names, the application-layer changes may be limited to initialization, authentication, and a few method signatures. If the target only offers a REST API, you are writing an adapter layer.
Session and state model: If your current provider has pause/resume and the target does not, you need to redesign how your agent handles multi-turn state. This is rarely a small change.
Package environment: If your current provider uses a custom base image with pre-installed packages, rebuilding that environment on a new provider takes time and testing.
Testing: Any sandbox migration should include an integration test suite that runs your actual agent workloads end-to-end on the target provider before switching production traffic. Unit tests that mock the sandbox are not enough — the behavior differences are exactly in the real execution environment.
A rough effort signal: If the target provider has an SDK that wraps the same primitives (create, run code, list files, download file, terminate), and if your session model is stateless per turn, migration is often a 1–2 day effort. If you rely on pause/resume, custom base images, or specific streaming output behavior, budget a week or more for design, implementation, and testing.
Pricing Model Differences
Sandbox pricing models vary significantly, and the right model depends on your workload shape.
Common pricing dimensions:
| Dimension | What it affects |
|---|---|
| Per-second billing | Workloads where sessions are short and idle time is low |
| Per-minute billing | Workloads where small billing increments matter less |
| Subscription floor | Fixed monthly cost regardless of usage |
| vCPU + memory billing | Customizable resource allocation; you pay for what you configure |
| Flat per-execution billing | Predictable cost for uniform task sizes |
Questions to ask:
- Is billing usage-based (per second/minute of active sandbox time) or subscription-based (monthly minimum)?
- Are vCPU and memory billed independently, or is billing tied to fixed tiers?
- What counts as a billable second — session creation time, active code execution time, or total session wall clock time?
- Is there a free tier, and what are its limits for your workload type?
- Is there a cost difference between cold-start sessions and pre-warmed sessions?
How pricing diverges in practice: A provider that charges from session creation through session termination (regardless of whether code is actively running) will be more expensive for workloads with long idle periods between agent turns. A provider that bills only during active execution is cheaper for those workloads but may not exist at the resource tier you need.
For high-concurrency RL or evaluation workloads, cost-per-thousand-runs often matters more than per-second rate. Run the math on a realistic monthly run count before selecting a provider.
Novita Agent Sandbox Fit
Novita Agent Sandbox is one of the primary migration targets this checklist is written for. It targets coding agent, browser agent, data analysis, evaluation, and RL workloads. For teams working through this checklist, here is where Novita fits and where gaps may exist:
SDK and API: Novita provides a Python SDK and REST API for sandbox creation, code execution, filesystem operations, and process management. Teams migrating from E2B-style workflows will find familiar primitives. Verify specific method names against Novita Sandbox documentation for your target SDK version.
Session lifecycle: Novita supports sessions up to 24 hours, Pause/Resume, and Autopause/Autoresume for idle sessions. For multi-turn coding agents that need to preserve in-session state across LLM calls, this is a meaningful operational difference from providers with 60-minute limits.
Concurrency: Novita’s base tier supports 50 concurrent sandboxes with no subscription fee. For evaluation or RL workloads that need higher concurrency, contact Novita for enterprise tiers.
Pricing model: Novita bills per-second on actual vCPU and memory, with no subscription minimum. For workloads with variable or bursty usage patterns, usage-based billing without a floor is often cheaper than subscription-based alternatives. Verify current rates at the Novita AI pricing page before making cost projections.
BYOC deployment: Novita supports running sandboxes inside your own AWS or GCP VPC. For teams with strict network isolation requirements, this avoids the multi-tenant public cloud model.
Where to verify carefully: E2B API/SDK compatibility, drop-in replacement guarantees, and specific capability parity are subject to ongoing development. Do not assume full compatibility without testing your specific workload patterns against Novita’s current SDK. Product review is recommended before publishing any compatibility claims.
Where Novita may not fit: Teams with deep investment in E2B-specific SDK abstractions, teams needing GPU sandbox support, or teams requiring on-premises deployment outside AWS/GCP should evaluate fit carefully before migrating.
FAQ
Is Novita Agent Sandbox a drop-in replacement for E2B?
Not by assumption. SDK method names, session lifecycle behavior, streaming output structure, and package install persistence all need to be tested against your specific workload before treating any provider as a drop-in replacement. Use the checklist in this guide to verify each dimension explicitly.
What is the minimum effort to migrate from E2B to a different sandbox provider?
If the target provider has an official SDK with similar primitives (create, run code, file operations, terminate) and your session model is stateless per turn, migration is often a 1–2 day effort covering SDK initialization, authentication, and a small number of method signatures. If you rely on pause/resume, custom base images, or specific streaming output behavior, budget a week or more.
Does Novita Agent Sandbox support pause and resume?
Yes. Novita supports Pause/Resume and Autopause/Autoresume for idle sessions, with session lengths up to 24 hours. This is relevant for multi-turn coding agents that need to preserve in-session state across LLM calls. Verify current behavior against Novita Sandbox documentation for your SDK version.
How do I test whether a target sandbox provider is compatible with my application?
Run your actual agent workloads end-to-end on the target provider in a staging environment before switching production traffic. Test the specific SDK methods your application calls, the streaming output structure your parser expects, package install persistence across tool calls, and file round-trips (upload, transform, download). Unit tests that mock the sandbox are not enough — compatibility differences appear only in real execution.
Does Novita support running sandboxes inside a private cloud account?
Yes. Novita supports BYOC (Bring Your Own Cloud) deployment inside your own AWS or GCP VPC. For teams with strict network isolation, data residency, or compliance requirements, this avoids the multi-tenant public cloud model. Contact Novita for current BYOC availability and configuration options.
