Skip to content

Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

targets:
- name: azure-base
provider: azure
endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
api_key: ${{ AZURE_OPENAI_API_KEY }}
model: ${{ AZURE_DEPLOYMENT_NAME }}
- name: vscode_dev
provider: vscode
workspace_template: ${{ WORKSPACE_PATH }}
judge_target: azure-base
- name: local_agent
provider: cli
command: 'python agent.py --prompt {PROMPT}'
judge_target: azure-base

Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:

targets:
- name: my_target
provider: anthropic
api_key: ${{ ANTHROPIC_API_KEY }}
model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files.

ProviderTypeDescription
azureLLMAzure OpenAI
anthropicLLMAnthropic Claude API
geminiLLMGoogle Gemini
claudeAgentClaude Agent SDK
codexAgentCodex CLI
pi-coding-agentAgentPi Coding Agent
vscodeAgentVS Code with Copilot
vscode-insidersAgentVS Code Insiders
cliAgentAny CLI command
mockTestingMock provider for dry runs

Set the default target at the top level or override per case:

# Top-level default
execution:
target: azure-base
tests:
- id: test-1
# Uses azure-base
- id: test-2
execution:
target: vscode_dev # Override for this case

Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:

targets:
- name: codex_target
provider: codex
judge_target: azure-base # LLM used for judging

For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.

targets:
- name: claude_agent
provider: claude
workspace_template: ./workspace-templates/my-project
judge_target: azure-base

When workspace_template is set:

  • The template directory is copied to ~/.agentv/workspaces/<eval-run-id>/shared/
  • The .git directory is skipped during copy
  • Tests share the workspace; use hooks.after_each to reset state between tests

Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).

workspace:
template: ./workspace-templates/my-project
hooks:
before_all:
command: ["bun", "run", "setup.ts"]
timeout_ms: 120000
cwd: ./scripts
after_each:
command: ["bun", "run", "reset.ts"]
timeout_ms: 5000
reset: fast
after_all:
command: ["bun", "run", "cleanup.ts"]
timeout_ms: 30000
FieldDescription
templateDirectory to copy as workspace (alternative to target-level workspace_template)
hooks.before_allRuns once after workspace creation, before the first test
hooks.after_allRuns once after the last test, before cleanup
hooks.before_eachRuns before each test
hooks.after_eachRuns after each test (supports both command and reset)

Each hook config accepts:

FieldDescription
commandCommand array (e.g., ["bun", "run", "setup.ts"])
resetReset mode: none, fast, strict
cleanCleanup mode: always, on_success, on_failure, never
timeout_msTimeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks)
cwdWorking directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → hooks.before_all → git baseline → (hooks.before_each → agent runs → file changes captured → hooks.after_each) × N tests → hooks.after_all → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

  • hooks.before_all / hooks.before_each command failure aborts the test with an error result
  • hooks.after_all / hooks.after_each command failure is non-fatal (warning only)

Script context: All scripts receive a JSON object on stdin with case context:

{
"workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
"test_id": "case-01",
"eval_run_id": "run-123",
"case_input": "Fix the bug",
"case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

Clone git repositories into the workspace automatically, with caching for fast repeat runs. Define repos at the suite level or per test:

workspace:
repos:
- path: ./my-repo
source:
type: git
url: https://github.com/org/repo.git
checkout:
ref: main
ancestor: 1 # check out the parent commit
clone:
depth: 10 # shallow clone
- path: ./local-copy
source:
type: local
path: /home/user/projects/my-project
hooks:
after_each:
reset: fast # none | fast | strict
isolation: shared # shared (default) | per_test
mode: pooled # pooled | ephemeral | static
static_path: /tmp/my-ws # required when mode=static
FieldDescription
repos[].pathDirectory within the workspace to clone into
repos[].source.typegit (remote URL) or local (absolute path)
repos[].checkout.refBranch, tag, or SHA to check out (default: HEAD)
repos[].checkout.resolveremote (ls-remote, default for git) or local
repos[].checkout.ancestorWalk N commits back from ref (e.g., 1 for parent)
repos[].clone.depthShallow clone depth
repos[].clone.filterPartial clone filter (e.g., blob:none)
repos[].clone.sparseSparse checkout paths
hooks.after_each.resetReset policy after each test: none, fast, strict
isolationshared reuses one workspace; per_test creates a fresh copy per test
modeWorkspace mode: pooled, ephemeral, static
static_pathExisting workspace path used when mode=static

Pooling: mode: pooled (or default shared repo mode) reuses pool slots between runs. Use mode: ephemeral to disable pooling for fresh clone/checkouts each run.

Pool management commands:

  • agentv workspace list — list all pool entries with size and repo info
  • agentv workspace clean — remove all pool entries

Common patterns:

# Pinned commit with shallow clone (fast CI runs)
workspace:
repos:
- path: ./repo
source:
type: git
url: https://github.com/org/repo.git
checkout:
ref: abc123def
clone:
depth: 1
# Multi-repo shared workspace with reset
workspace:
repos:
- path: ./frontend
source: { type: git, url: https://github.com/org/frontend.git }
- path: ./backend
source: { type: git, url: https://github.com/org/backend.git }
hooks:
after_each:
reset: fast

Default finish behavior:

  • Success: cleanup
  • Failure: keep

CLI overrides:

  • --retain-on-success keep|cleanup
  • --retain-on-failure keep|cleanup
OptionUse Case
cwdRun in an existing directory (shared across tests)
workspace_templateCopy template to temp location (isolated per case)

These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.