Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

Structure

targets:
  - name: azure-base
    provider: azure
    endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
    api_key: ${{ AZURE_OPENAI_API_KEY }}
    model: ${{ AZURE_DEPLOYMENT_NAME }}

  - name: vscode_dev
    provider: vscode
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure-base

  - name: local_agent
    provider: cli
    command: 'python agent.py --prompt {PROMPT}'
    judge_target: azure-base

Environment Variables

Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:

targets:
  - name: my_target
    provider: anthropic
    api_key: ${{ ANTHROPIC_API_KEY }}
    model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files.

Supported Providers

Provider	Type	Description
`azure`	LLM	Azure OpenAI
`anthropic`	LLM	Anthropic Claude API
`gemini`	LLM	Google Gemini
`claude`	Agent	Claude Agent SDK
`codex`	Agent	Codex CLI
`pi-coding-agent`	Agent	Pi Coding Agent
`vscode`	Agent	VS Code with Copilot
`vscode-insiders`	Agent	VS Code Insiders
`cli`	Agent	Any CLI command
`mock`	Testing	Mock provider for dry runs

Referencing Targets in Evals

Set the default target at the top level or override per case:

# Top-level default
execution:
  target: azure-base

tests:
  - id: test-1
    # Uses azure-base

  - id: test-2
    execution:
      target: vscode_dev  # Override for this case

Judge Target

Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:

targets:
  - name: codex_target
    provider: codex
    judge_target: azure-base  # LLM used for judging

Workspace Template

For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.

targets:
  - name: claude_agent
    provider: claude
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

When workspace_template is set:

The template directory is copied to ~/.agentv/workspaces/<eval-run-id>/shared/
The .git directory is skipped during copy
Tests share the workspace; use hooks.after_each to reset state between tests

Workspace Lifecycle Hooks

Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).

workspace:
  template: ./workspace-templates/my-project
  hooks:
    before_all:
      command: ["bun", "run", "setup.ts"]
      timeout_ms: 120000
      cwd: ./scripts
    after_each:
      command: ["bun", "run", "reset.ts"]
      timeout_ms: 5000
      reset: fast
    after_all:
      command: ["bun", "run", "cleanup.ts"]
      timeout_ms: 30000

Field	Description
`template`	Directory to copy as workspace (alternative to target-level `workspace_template`)
`hooks.before_all`	Runs once after workspace creation, before the first test
`hooks.after_all`	Runs once after the last test, before cleanup
`hooks.before_each`	Runs before each test
`hooks.after_each`	Runs after each test (supports both `command` and `reset`)

Each hook config accepts:

Field	Description
`command`	Command array (e.g., `["bun", "run", "setup.ts"]`)
`reset`	Reset mode: `none`, `fast`, `strict`
`clean`	Cleanup mode: `always`, `on_success`, `on_failure`, `never`
`timeout_ms`	Timeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks)
`cwd`	Working directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → hooks.before_all → git baseline → (hooks.before_each → agent runs → file changes captured → hooks.after_each) × N tests → hooks.after_all → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

hooks.before_all / hooks.before_each command failure aborts the test with an error result
hooks.after_all / hooks.after_each command failure is non-fatal (warning only)

Script context: All scripts receive a JSON object on stdin with case context:

{
  "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
  "test_id": "case-01",
  "eval_run_id": "run-123",
  "case_input": "Fix the bug",
  "case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

Repository Lifecycle

Clone git repositories into the workspace automatically, with caching for fast repeat runs. Define repos at the suite level or per test:

workspace:
  repos:
    - path: ./my-repo
      source:
        type: git
        url: https://github.com/org/repo.git
      checkout:
        ref: main
        ancestor: 1          # check out the parent commit
      clone:
        depth: 10             # shallow clone
    - path: ./local-copy
      source:
        type: local
        path: /home/user/projects/my-project
  hooks:
    after_each:
      reset: fast             # none | fast | strict
  isolation: shared           # shared (default) | per_test
  mode: pooled                # pooled | ephemeral | static
  static_path: /tmp/my-ws     # required when mode=static

Field	Description
`repos[].path`	Directory within the workspace to clone into
`repos[].source.type`	`git` (remote URL) or `local` (absolute path)
`repos[].checkout.ref`	Branch, tag, or SHA to check out (default: `HEAD`)
`repos[].checkout.resolve`	`remote` (ls-remote, default for git) or `local`
`repos[].checkout.ancestor`	Walk N commits back from ref (e.g., `1` for parent)
`repos[].clone.depth`	Shallow clone depth
`repos[].clone.filter`	Partial clone filter (e.g., `blob:none`)
`repos[].clone.sparse`	Sparse checkout paths
`hooks.after_each.reset`	Reset policy after each test: `none`, `fast`, `strict`
`isolation`	`shared` reuses one workspace; `per_test` creates a fresh copy per test
`mode`	Workspace mode: `pooled`, `ephemeral`, `static`
`static_path`	Existing workspace path used when `mode=static`

Pooling: mode: pooled (or default shared repo mode) reuses pool slots between runs. Use mode: ephemeral to disable pooling for fresh clone/checkouts each run.

Pool management commands:

agentv workspace list — list all pool entries with size and repo info
agentv workspace clean — remove all pool entries

Common patterns:

# Pinned commit with shallow clone (fast CI runs)
workspace:
  repos:
    - path: ./repo
      source:
        type: git
        url: https://github.com/org/repo.git
      checkout:
        ref: abc123def
      clone:
        depth: 1

# Multi-repo shared workspace with reset
workspace:
  repos:
    - path: ./frontend
      source: { type: git, url: https://github.com/org/frontend.git }
    - path: ./backend
      source: { type: git, url: https://github.com/org/backend.git }
  hooks:
    after_each:
      reset: fast

Cleanup Behavior

Default finish behavior:

Success: cleanup
Failure: keep

CLI overrides:

--retain-on-success keep|cleanup
--retain-on-failure keep|cleanup

cwd vs workspace_template

Option	Use Case
`cwd`	Run in an existing directory (shared across tests)
`workspace_template`	Copy template to temp location (isolated per case)

These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.