bt eval

Run evaluation files against Braintrust. Supports JavaScript and Python.

bt eval is currently macOS and Linux only. Windows support is planned.

File selection

bt eval — discover and run all eval files in the current directory (recursive)
bt eval tests/ — discover eval files under a specific directory
bt eval "tests/**/*.eval.ts" — glob pattern
bt eval a.eval.ts b.eval.ts — one or more explicit files

Files inside node_modules, .venv, venv, site-packages, dist-packages, and __pycache__ are excluded from automatic discovery. Explicit paths and globs bypass these exclusions.

JavaScript runners

By default, bt eval auto-detects a runner from your project (tsx, vite-node, ts-node, then ts-node-esm). Set one explicitly with --runner / BT_EVAL_RUNNER:

bt eval --runner vite-node tutorial.eval.ts
bt eval --runner tsx tutorial.eval.ts

bt eval automatically resolves locally installed binaries from node_modules/.bin, so you can write --runner tsx instead of --runner ./node_modules/.bin/tsx (for example). If you see ESM or top-level await errors, try --runner vite-node.

Python

bt eval also runs Python eval files. Use --language py to force language detection. By default, if VIRTUAL_ENV is set, bt uses that virtualenv’s Python; otherwise it searches PATH for python3 or python. To use a specific interpreter, set BT_EVAL_PYTHON_RUNNER to its name or path (e.g. python3.11). The --num-workers flag controls concurrency for Python execution.

bt eval my_eval.py
bt eval --language py --num-workers 4 my_eval.py

Flags

Flag	Env var	Description
`--runner <RUNNER>`	`BT_EVAL_RUNNER`	Runner binary (`tsx`, `bun`, `ts-node`, `python`, etc.)
`--language <LANG>`	`BT_EVAL_LANGUAGE`	Force language: `js` or `py`
`--filter <PATTERN>`	`BT_EVAL_FILTER`	Run only evaluators matching the pattern
`--watch` / `-w`	`BT_EVAL_WATCH`	Re-run when input files change
`--no-send-logs`	`BT_EVAL_LOCAL`	Run without sending results to Braintrust
`--num-workers <N>`		Worker threads for Python execution
`--verbose`		Show full errors and stderr from eval files
`--list`		List evaluators without running them
`--jsonl`		Output one JSON summary per evaluator (for scripts). See also the global `--json` flag (overview), which formats all CLI output as JSON rather than per-evaluator summaries.
`--terminate-on-failure`		Stop after the first failing evaluator
`--dev`		Start a local web server for browser-based eval development (default port: 8300)

Passing arguments to the eval file

Use -- to forward extra arguments to the eval file via process.argv:

bt eval foo.eval.ts -- --description "Prod" --shard 1/4

Running in CI

Set BRAINTRUST_API_KEY instead of using OAuth login:

# GitHub Actions example
- name: Run evals
  env:
    BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
  run: bt eval tests/

Use --no-input and --json for non-interactive output:

BRAINTRUST_API_KEY=... bt eval tests/ --no-input --json

SDKs

API

CLI

Other

File selection

JavaScript runners

Python

Flags

Passing arguments to the eval file

Running in CI

SDKs

API

CLI

Other

​File selection

​JavaScript runners

​Python

​Flags

​Passing arguments to the eval file

​Running in CI

File selection

JavaScript runners

Python

Flags

Passing arguments to the eval file

Running in CI