- What is a coding agent sandbox?
- Coding agent sandbox architecture
- How should terminal access work in a coding agent sandbox?
- Repository isolation and branch control for agent changes
- Command, package, and network policies for sandboxed coding agents
- Secrets, logs, and audit trails for agent workspaces
- Diffs, previews, and review gates before merge
- Cleanup and reset strategy for long-running agent sessions
- Where Novita Agent Sandbox fits in this workflow
- Coding agent sandbox implementation checklist
- FAQ
Run a coding agent in a sandbox by giving it a scoped repository workspace, a controlled terminal execution path, explicit file permissions, network and package-install policies, isolated secrets, command logs, artifacts, and a clear approval path for high-risk changes before merge or deployment. That pattern works whether the agent is Codex-style, IDE-connected, CI-triggered, or embedded in your own developer platform: the model can plan and edit, but the sandbox decides what it can touch, what it can run, what it can fetch, and what evidence a reviewer receives.
What is a coding agent sandbox?
A coding agent sandbox is an isolated runtime where an AI system can inspect code, edit files, run terminal commands, install dependencies when policy allows, execute tests, start preview servers, and return a reviewable diff without receiving broad access to the developer’s machine or production environment.
The important shift is that the sandbox is not just a chat wrapper around a model. It is the operating boundary for the work. The model proposes actions; the sandbox enforces the workspace, tools, permissions, and evidence trail.
For a simple code assistant, a local checkout and manual copy-paste may be enough. For an agent that can run commands or continue for many steps, you need stronger boundaries:
- A dedicated workspace for each task or session.
- A known repository state and branch.
- A command execution interface with approvals for risky operations.
- A package-install policy for
npm,pip,cargo,apt, and similar tools. - Network egress rules for registries, docs, APIs, and preview access.
- Secrets that are scoped to the task and hidden from logs where possible.
- Captured stdout, stderr, exit codes, file changes, generated artifacts, and preview URLs.
- A review gate before merge, deployment, or external release.
This is why “run Codex in a sandbox” should be understood as an infrastructure pattern, not a single CLI flag or one vendor integration. Codex CLI itself is documented as a coding agent that runs locally on your computer, and OpenAI’s Codex documentation describes a terminal-oriented workflow. If you operate that kind of agent for a team, CI system, or product workflow, the surrounding execution environment becomes the control plane.
Coding agent sandbox architecture
The cleanest architecture separates the model loop from the execution boundary:
| Layer | Responsibility | Questions to answer |
|---|---|---|
| Agent interface | Turns user intent into plans, file edits, tool calls, and review summaries | Which model or coding agent is used? How are prompts, context, and tool schemas managed? |
| Workspace manager | Creates the sandbox, checks out the repo, sets the branch, and mounts allowed files | Is each task isolated? Is the base commit known? Can the workspace be reset? |
| Terminal runner | Executes approved commands and streams results back to the agent | Which commands are allowed automatically, require approval, or are blocked? |
| Policy layer | Controls filesystem scope, secrets, network egress, package installs, runtime limits, and cleanup | Can the agent fetch packages? Can it call the public internet? Can it read credentials? |
| Evidence layer | Stores logs, diffs, test results, previews, and artifacts | Can a reviewer reconstruct what happened without trusting the model’s summary? |
| Review gate | Requires a human or trusted automation step before merge, publish, or deployment | Who approves risky changes? What checks must pass first? |
In practice, a single platform may combine several of these layers. The architecture still matters because it keeps product choices honest. If a tool gives an agent a terminal but cannot show command logs, file diffs, or egress policy, it may be convenient for prototyping but thin for production review.
How should terminal access work in a coding agent sandbox?
The terminal is where a coding agent becomes operationally useful and operationally risky. It can run tests, build assets, inspect generated files, start local servers, and diagnose failures. It can also delete files, leak environment variables, run unexpected install scripts, or consume large compute resources.
A good terminal model has three parts.
First, define command classes. Safe read-only commands such as ls, sed, rg, git diff, and test-status commands can often run automatically. Build and test commands such as npm test, pytest, cargo test, and npm run build may be allowed with timeouts. Destructive or external-impact commands such as rm -rf, git push, gh pr merge, deployment CLIs, package publishing, database migration, or cloud-resource mutation should require explicit approval or be blocked entirely.
Second, stream results with structure. The agent and reviewer should see command, working directory, start time, exit code, stdout, stderr, timeout state, and truncated-output policy. A screenshot of a terminal is not enough; the system should preserve machine-readable logs.
Third, handle long-running sessions deliberately. Coding agents often need a background dev server, watcher, browser automation process, or integration-test stack. Treat long-running processes as resources with handles: start them, stream logs, expose only the required preview port, and stop them during cleanup. Do not let a background process become an untracked side effect of a chat session.
Repository isolation and branch control for agent changes
Repo state is the backbone of a reviewable coding-agent workflow. The agent should not work in an ambiguous folder with unknown local edits unless the user explicitly chose that mode.
For team workflows, start every task from a known repository URL, base branch, and commit SHA. Create a task branch or detached workspace. Keep user changes separate from agent changes, and capture the exact diff before review. If the sandbox supports persistent sessions, persist the workspace intentionally; do not rely on accidental process state.
The default pattern looks like this:
1. Create isolated workspace for task-123.
2. Check out repository at main@<base_sha>.
3. Create branch agent/task-123.
4. Run dependency install according to policy.
5. Let the agent inspect, edit, test, and iterate.
6. Capture git diff, test output, generated artifacts, and preview URL.
7. Open a pull request or hand the patch to a human reviewer.
8. Tear down or archive the workspace according to retention policy.
The key detail is step 6. A useful coding agent does not just say “I fixed it.” It returns the changed files, why each change exists, what validation ran, what failed, and what remains unverified.
Command, package, and network policies for sandboxed coding agents
Package installs are one of the hardest parts of coding-agent sandboxing. Many real tasks need dependencies. Many supply-chain incidents also start with dependency fetching, post-install scripts, or opaque binaries.
A practical policy is not “never install packages.” It is “install packages only through known paths, with logging and scope.”
| Control | Practical implementation |
|---|---|
| Package managers | Decide which package managers are available by language and repo type. |
| Registry access | Allow approved registries; block arbitrary package sources when the task does not need them. |
| Lockfiles | Prefer existing lockfiles and reproducible install commands. |
| Post-install scripts | Decide whether lifecycle scripts can run automatically or require approval. |
| System packages | Treat apt, brew, and OS package installs as higher risk than project dependency installs. |
| Caches | Use controlled package caches when you need speed and reproducibility. |
| Logging | Store package names, versions, registry URLs, checksums when available, and install output. |
Network policy should be similarly explicit. A coding agent may need to read public documentation, call a staging API, download a package, or expose a local preview. Those are different from unrestricted internet access. Separate outbound package fetches, web browsing, API calls, webhook delivery, and preview ingress. If your product handles sensitive code or data, ask whether DNS, proxy logs, and registry mirrors are covered by the same policy as HTTP traffic.
Secrets, logs, and audit trails for agent workspaces
Secrets should be scoped to the smallest useful surface. A coding agent normally does not need production credentials. It may need a read-only Git token, a package-registry token, a staging API key, or a preview-deployment token. Each should be task-scoped, time-limited where possible, and unavailable to commands that do not require it.
Avoid placing secrets in files the agent can read unless the task truly requires it. Prefer brokered access: the sandbox can perform an operation, but the model does not see the raw credential. When environment variables are necessary, logs should redact known secret patterns, and reviewer artifacts should not include full environment dumps.
For audit trails, store more than the final patch:
- User request and task metadata.
- Repository URL, base commit, branch, and final commit or diff.
- Commands requested, approved, blocked, and executed.
- Command outputs, exit codes, and timeouts.
- File reads and writes when the platform can capture them.
- Network and package-fetch records at the level your policy supports.
- Preview URLs and generated artifact paths.
- Human approvals and merge decisions.
This is not bureaucracy. It is how a reviewer distinguishes a real fix from a plausible story.
Diffs, previews, and review gates before merge
The most useful output from a coding agent is a reviewable change set. That means the sandbox should produce the same artifacts a careful engineer would expect from a pull request:
- A focused diff.
- Tests or build commands that were run.
- Failures that remain.
- Screenshots, preview URLs, or downloadable files when UI or generated assets changed.
- A short explanation of the intended behavior change.
Keep the final merge or deployment behind a human-controlled gate unless your organization has built a separate trusted automation policy for that exact repository and risk level. Human review is especially important when changes touch authentication, billing, data access, network calls, infrastructure, dependency versions, generated migrations, or user-visible content.
Preview handling deserves its own rule: expose only the service and port required for review. A sandbox that starts a web app should give reviewers a scoped preview URL, not broad network access into the workspace.
Cleanup and reset strategy for long-running agent sessions
Every sandbox needs a lifecycle. Without one, long-running coding-agent infrastructure becomes a pile of stale workspaces, leaked logs, and still-running processes.
For short tasks, an ephemeral model works well: create a sandbox, run the job, extract artifacts, then destroy it. For larger tasks, persistence can be valuable: the agent may need to pause, wait for review, resume from the same branch, or keep a dev server running during a review session. Persistence should be an explicit product feature with expiry, owner, and retention rules.
Define cleanup for:
- Background processes and open ports.
- Temporary files and build outputs.
- Package caches and downloaded archives.
- Task-scoped secrets.
- Logs and artifacts.
- Branches or worktrees that have been superseded.
Reset is just as important. A reviewer should be able to rerun the agent’s validation from the base commit or the final branch. If the result only works because of invisible state inside a long-lived session, the workflow is hard to trust.
Where Novita Agent Sandbox fits in this workflow
Novita Agent Sandbox is designed for agent infrastructure where code execution, browser automation, computer-use style workflows, data analysis, evaluations, and longer-running agent workflows need an isolated runtime. The Novita Agent Sandbox documentation describes the product as a stateful environment for running agent workloads, with SDK and CLI paths for working with sandbox lifecycle, files, commands, browser sessions, and related workflow primitives.
For teams already using Novita AI model APIs, a sandbox layer can reduce the gap between model inference and action execution. The model can reason, call tools, and plan code changes; the sandbox can provide the isolated workspace where those actions are executed, logged, previewed, and reviewed.
Use conservative product boundaries when designing your workflow:
- Treat Novita Agent Sandbox as the execution environment, not a blanket security guarantee.
- Keep secrets, package installs, egress, and publish actions behind your own policy.
- Validate current SDK, CLI, pricing, and account-limit details from Novita documentation before hard-coding them into production automation.
- Evaluate isolation boundaries, third-party agent compatibility, and compliance requirements against your own policy before relying on any sandbox in production.
That separation keeps implementation guidance useful even when the agent layer changes. You can use Codex-style agents, internal coding agents, browser agents, or evaluation workers while keeping the same sandbox control questions.
Coding agent sandbox implementation checklist
Use this checklist before moving a coding-agent sandbox beyond a prototype.
| Area | Minimum production question |
|---|---|
| Workspace | Does each task get a scoped filesystem and known repo base commit? |
| Branching | Are agent changes isolated on a branch or patch that reviewers can inspect? |
| Terminal | Are commands logged with working directory, output, exit code, and timeout? |
| Approval | Which commands run automatically, require approval, or are blocked? |
| Packages | Are dependency installs reproducible and logged? |
| Network | Is egress separated by package fetches, docs browsing, API calls, and preview access? |
| Secrets | Are credentials task-scoped and redacted from logs? |
| Previews | Are preview ports explicit and easy to shut down? |
| Artifacts | Are generated files, screenshots, reports, and logs attached to the review? |
| Persistence | Is session pause/resume intentional, with owner and expiry? |
| Cleanup | Are processes, ports, temp files, secrets, and stale workspaces removed? |
| Review | Does a human approve merge, publish, or deployment for risky changes? |
If your current setup cannot answer several of these questions, keep the workflow in a prototype lane. The agent may still be useful, but it should not receive broad repository, network, or credential access.
FAQ
Can I run Codex itself inside a cloud sandbox?
Conceptually, yes: a terminal coding agent can be run inside an isolated workspace if the environment supports the operating system, authentication path, terminal I/O, filesystem access, and network access the agent requires. Do not assume an official integration or full compatibility unless the sandbox provider and the agent provider document it for your exact setup.
Is Docker enough for a coding-agent sandbox?
Docker can be useful for local development, CI jobs, and repeatable environments, but “enough” depends on your threat model. Ask what shares a kernel, what file mounts exist, how network egress is controlled, whether secrets are exposed to the container, and how escapes or dependency compromise would be handled. For sensitive workloads, security teams often evaluate stronger isolation boundaries and stricter egress controls.
Should a coding agent have internet access?
Only when the task needs it, and only through a policy you can explain. Documentation lookup, package registry access, staging API calls, and arbitrary browsing are different permissions. Log what the agent fetched, keep package installs reproducible, and avoid giving production network access to a general-purpose coding session.
What should a reviewer look at before merging agent-generated code?
Review the diff, the commands that ran, test/build output, dependency changes, generated artifacts, preview behavior, and any skipped validation. Pay extra attention to auth, permissions, data handling, network calls, migrations, install scripts, and secrets.
How does Novita help with coding-agent sandboxes?
Novita Agent Sandbox provides an isolated agent runtime for workloads such as code execution, browser automation, computer-use style tasks, data analysis, evaluations, and longer-running workflows. Pair it with explicit repository, command, package, network, secrets, and review policies when building a coding-agent workflow.
Recommended articles
