Run Harbor Agent Evaluations on Novita Agent Sandbox

Harbor is a framework for evaluating and optimizing agents and language models. It is designed around benchmark tasks, containerized environments, parallel trials, and rollout generation for optimization workflows. For teams evaluating coding agents or tool-using agents, the execution environment is not a side detail: it determines how tasks are built, how commands run, how files move in and out, and how verifier logs are collected.

This post looks at the Novita environment code path in Harbor and how it maps to Novita Agent Sandbox. The scope is intentionally narrow: this is an implementation-oriented overview, not a partnership announcement, not a benchmark, and not a cost comparison.

What Harbor Needs From A Cloud Sandbox

Harbor tasks define an instruction, tests, optional solution logic, and an environment. The environment is usually represented by files under an environment/ directory. Harbor’s task docs explain that the required files depend on the selected environment type. Docker can use a Dockerfile or Compose file, while most cloud sandbox providers support Dockerfile-defined environments rather than Docker Compose.

That model matters for agent evaluations. A Harbor run needs to create an isolated task environment, execute agent and verifier commands, transfer files such as tests and artifacts, and then clean up the runtime. When a run scales from a few local trials to many remote trials, the sandbox provider becomes part of the evaluation harness.

Where Novita Agent Sandbox Fits

Novita Agent Sandbox is a cloud sandbox runtime for AI agents that execute generated code. The product docs describe a secure, isolated sandbox environment, multi-language execution support, pause/resume, background execution, and SDK/CLI management. In a Harbor context, the relevant runtime surface is practical: create a sandbox, run commands, move files, and manage sandbox lifecycle.

Harbor source tree includes Novita environment code/path on main, including a novita environment type, factory wiring for harbor.environments.novita.NovitaEnvironment, and source-level optional dependency wiring for Novita Agent Sandbox. This should be described as source-tree state, not as released PyPI support.

Integration Shape

At a high level, the Novita environment path in Harbor maps a Harbor task environment to a Novita sandbox template and runtime session:

  • Harbor reads the task environment definition, typically from an environment/Dockerfile.
  • The Novita environment implementation builds or reuses a Novita sandbox template for that environment.
  • Harbor creates a sandbox from the template for the evaluation trial.
  • Agent, verifier, and setup commands run inside the sandbox.
  • Files are uploaded to and downloaded from the sandbox as required by the Harbor task lifecycle.
  • The sandbox is stopped or cleaned up when the trial completes.

This is useful because it keeps Harbor’s evaluation abstraction intact. Task authors still reason in Harbor terms: instructions, tests, reward files, artifacts, and environment files. The sandbox provider handles the remote execution environment behind that interface.

Current Release Status

Harbor source tree includes Novita environment code/path on main. That includes source code for a novita environment type, a NovitaEnvironment implementation, and source-level optional dependency wiring for Novita Agent Sandbox. Treat this as source-tree guidance for now, not as a runnable PyPI quickstart.

As of the current publication check, Harbor PyPI release 0.7.0 does not include the novita extra, does not install novita-sandbox through a Harbor Novita extra, and does not expose novita as a valid CLI environment value. Do not publish commands that ask readers to install a Novita Harbor extra or run Harbor with a Novita CLI environment against the PyPI package until a Harbor release containing this interface is available.

The safe guidance for readers is: Harbor source tree includes Novita environment code/path on main, while the released PyPI package has not yet shipped that interface. Once Harbor publishes a release that includes the Novita extra and CLI environment, this article can be updated with tested installation and run commands.

A Minimal Task Mental Model

A Harbor task generally includes:

  • instruction.md for the agent-facing task.
  • task.toml for task metadata and runtime configuration.
  • environment/ for the container environment definition.
  • tests/ for verifier logic.
  • An optional solution/ directory for oracle or sanity-check workflows.

For a cloud sandbox provider, keep the environment definition portable. Harbor’s docs note that most cloud sandbox providers only support Dockerfile-defined environments, so a Docker Compose-based task should be reviewed before assuming it can run remotely.

What This Is Not Claiming

This integration path should be described precisely. The current public-source-backed claims are about code-level state on Harbor main and Novita Agent Sandbox product capabilities, not business or benchmark claims. Avoid saying that Novita and Harbor have announced an official partnership unless a public announcement exists. Avoid saying the Novita path is faster, more affordable, or more reliable than other Harbor environment providers unless there is a benchmark or pricing comparison to cite. Avoid implying that Harbor’s public docs currently recommend Novita as the default cloud provider.

The strongest publishable statement is narrower and more useful: Harbor source tree includes Novita environment code/path on main, and Novita Agent Sandbox provides sandbox runtime primitives that a Harbor evaluation environment needs: isolated execution, command execution, file operations, template-based setup, and lifecycle management.

How To Use This Today

Use this article as an architectural overview and release-status note, not as a copy-paste quickstart. If you are evaluating Harbor today from the current PyPI package, check the installed Harbor version and CLI help before assuming the Novita environment is available. If the installed release does not expose the Novita extra or CLI environment, wait for a Harbor release that ships that interface before publishing runnable commands or adding the path to production evaluation docs.

For the final Novita publication, tested command blocks can be added after Harbor publishes a release containing the Novita extra and CLI environment. Until then, the article should keep the boundary clear: Harbor main contains the Novita environment code path, while current PyPI users should not be directed to run Novita-specific Harbor commands.

FAQ

Does Harbor support Novita Agent Sandbox?

Harbor’s main branch includes a Novita environment path that maps Harbor evaluation environments to Novita Agent Sandbox. Treat it as source-tree support until a Harbor release ships the Novita extra and CLI environment.

Can I install Harbor with Novita support from PyPI today?

Not from the current verified PyPI release. The latest checked Harbor package, harbor 0.7.0, does not include the Novita extra or the Novita environment implementation, so this post does not present it as a ready-to-run install path.

Why are there no runnable Harbor commands in this post?

Runnable commands would imply that the Novita path is available in the released Harbor package. Until the package and CLI surface are released and tested, the safer guidance is to explain the integration shape and the current release boundary.

What changes after Harbor releases Novita support?

After Harbor publishes a release with the Novita extra and CLI environment, this article should be updated into a practical quickstart with verified installation steps, environment variables, a Dockerfile-based task example, and expected validation output.

Sources


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading