Targets Configuration
Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.
Structure
Section titled “Structure”targets: - name: azure-base provider: azure endpoint: ${{ AZURE_OPENAI_ENDPOINT }} api_key: ${{ AZURE_OPENAI_API_KEY }} model: ${{ AZURE_DEPLOYMENT_NAME }}
- name: vscode_dev provider: vscode workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure-base
- name: local_agent provider: cli command: 'python agent.py --prompt {PROMPT}' judge_target: azure-baseEnvironment Variables
Section titled “Environment Variables”Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:
targets: - name: my_target provider: anthropic api_key: ${{ ANTHROPIC_API_KEY }} model: ${{ ANTHROPIC_MODEL }}This keeps secrets out of version-controlled files.
Supported Providers
Section titled “Supported Providers”| Provider | Type | Description |
|---|---|---|
azure | LLM | Azure OpenAI |
anthropic | LLM | Anthropic Claude API |
gemini | LLM | Google Gemini |
claude | Agent | Claude Agent SDK |
codex | Agent | Codex CLI |
pi-coding-agent | Agent | Pi Coding Agent |
vscode | Agent | VS Code with Copilot |
vscode-insiders | Agent | VS Code Insiders |
cli | Agent | Any CLI command |
mock | Testing | Mock provider for dry runs |
Referencing Targets in Evals
Section titled “Referencing Targets in Evals”Set the default target at the top level or override per case:
# Top-level defaultexecution: target: azure-base
tests: - id: test-1 # Uses azure-base
- id: test-2 execution: target: vscode_dev # Override for this caseJudge Target
Section titled “Judge Target”Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:
targets: - name: codex_target provider: codex judge_target: azure-base # LLM used for judgingWorkspace Template
Section titled “Workspace Template”For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.
targets: - name: claude_agent provider: claude workspace_template: ./workspace-templates/my-project judge_target: azure-baseWhen workspace_template is set:
- The template directory is copied to
~/.agentv/workspaces/<eval-run-id>/shared/ - The
.gitdirectory is skipped during copy - Tests share the workspace; use
hooks.after_eachto reset state between tests
Workspace Lifecycle Hooks
Section titled “Workspace Lifecycle Hooks”Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).
workspace: template: ./workspace-templates/my-project hooks: before_all: command: ["bun", "run", "setup.ts"] timeout_ms: 120000 cwd: ./scripts after_each: command: ["bun", "run", "reset.ts"] timeout_ms: 5000 reset: fast after_all: command: ["bun", "run", "cleanup.ts"] timeout_ms: 30000| Field | Description |
|---|---|
template | Directory to copy as workspace (alternative to target-level workspace_template) |
hooks.before_all | Runs once after workspace creation, before the first test |
hooks.after_all | Runs once after the last test, before cleanup |
hooks.before_each | Runs before each test |
hooks.after_each | Runs after each test (supports both command and reset) |
Each hook config accepts:
| Field | Description |
|---|---|
command | Command array (e.g., ["bun", "run", "setup.ts"]) |
reset | Reset mode: none, fast, strict |
clean | Cleanup mode: always, on_success, on_failure, never |
timeout_ms | Timeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks) |
cwd | Working directory (relative paths resolved against eval file directory) |
Lifecycle order: template copy → hooks.before_all → git baseline → (hooks.before_each → agent runs → file changes captured → hooks.after_each) × N tests → hooks.after_all → cleanup
Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).
Error handling:
hooks.before_all/hooks.before_eachcommand failure aborts the test with an error resulthooks.after_all/hooks.after_eachcommand failure is non-fatal (warning only)
Script context: All scripts receive a JSON object on stdin with case context:
{ "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01", "test_id": "case-01", "eval_run_id": "run-123", "case_input": "Fix the bug", "case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }}Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.
Repository Lifecycle
Section titled “Repository Lifecycle”Clone git repositories into the workspace automatically, with caching for fast repeat runs. Define repos at the suite level or per test:
workspace: repos: - path: ./my-repo source: type: git url: https://github.com/org/repo.git checkout: ref: main ancestor: 1 # check out the parent commit clone: depth: 10 # shallow clone - path: ./local-copy source: type: local path: /home/user/projects/my-project hooks: after_each: reset: fast # none | fast | strict isolation: shared # shared (default) | per_test mode: pooled # pooled | ephemeral | static static_path: /tmp/my-ws # required when mode=static| Field | Description |
|---|---|
repos[].path | Directory within the workspace to clone into |
repos[].source.type | git (remote URL) or local (absolute path) |
repos[].checkout.ref | Branch, tag, or SHA to check out (default: HEAD) |
repos[].checkout.resolve | remote (ls-remote, default for git) or local |
repos[].checkout.ancestor | Walk N commits back from ref (e.g., 1 for parent) |
repos[].clone.depth | Shallow clone depth |
repos[].clone.filter | Partial clone filter (e.g., blob:none) |
repos[].clone.sparse | Sparse checkout paths |
hooks.after_each.reset | Reset policy after each test: none, fast, strict |
isolation | shared reuses one workspace; per_test creates a fresh copy per test |
mode | Workspace mode: pooled, ephemeral, static |
static_path | Existing workspace path used when mode=static |
Pooling: mode: pooled (or default shared repo mode) reuses pool slots between runs. Use mode: ephemeral to disable pooling for fresh clone/checkouts each run.
Pool management commands:
agentv workspace list— list all pool entries with size and repo infoagentv workspace clean— remove all pool entries
Common patterns:
# Pinned commit with shallow clone (fast CI runs)workspace: repos: - path: ./repo source: type: git url: https://github.com/org/repo.git checkout: ref: abc123def clone: depth: 1
# Multi-repo shared workspace with resetworkspace: repos: - path: ./frontend source: { type: git, url: https://github.com/org/frontend.git } - path: ./backend source: { type: git, url: https://github.com/org/backend.git } hooks: after_each: reset: fastCleanup Behavior
Section titled “Cleanup Behavior”Default finish behavior:
- Success: cleanup
- Failure: keep
CLI overrides:
--retain-on-success keep|cleanup--retain-on-failure keep|cleanup
cwd vs workspace_template
Section titled “cwd vs workspace_template”| Option | Use Case |
|---|---|
cwd | Run in an existing directory (shared across tests) |
workspace_template | Copy template to temp location (isolated per case) |
These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.