Run evaluation files against Braintrust. Supports JavaScript and Python.
bt eval is currently macOS and Linux only. Windows support is planned.
File selection
bt eval — discover and run all eval files in the current directory (recursive)
bt eval tests/ — discover eval files under a specific directory
bt eval "tests/**/*.eval.ts" — glob pattern
bt eval a.eval.ts b.eval.ts — one or more explicit files
Files inside node_modules, .venv, venv, site-packages, dist-packages, and __pycache__ are excluded from automatic discovery. Explicit paths and globs bypass these exclusions.
JavaScript runners
By default, bt eval auto-detects a runner from your project (tsx, vite-node, ts-node, then ts-node-esm). Set one explicitly with --runner / BT_EVAL_RUNNER:
bt eval --runner vite-node tutorial.eval.ts
bt eval --runner tsx tutorial.eval.ts
bt eval automatically resolves locally installed binaries from node_modules/.bin, so you can write --runner tsx instead of --runner ./node_modules/.bin/tsx (for example). If you see ESM or top-level await errors, try --runner vite-node.
Python
bt eval also runs Python eval files. Use --language py to force language detection. By default, if VIRTUAL_ENV is set, bt uses that virtualenv’s Python; otherwise it searches PATH for python3 or python. To use a specific interpreter, set BT_EVAL_PYTHON_RUNNER to its name or path (e.g. python3.11). The --num-workers flag controls concurrency for Python execution.
bt eval my_eval.py
bt eval --language py --num-workers 4 my_eval.py
Flags
| Flag | Env var | Description |
|---|
--runner <RUNNER> | BT_EVAL_RUNNER | Runner binary (tsx, bun, ts-node, python, etc.) |
--language <LANG> | BT_EVAL_LANGUAGE | Force language: js or py |
--filter <PATTERN> | BT_EVAL_FILTER | Run only evaluators matching the pattern |
--watch / -w | BT_EVAL_WATCH | Re-run when input files change |
--no-send-logs | BT_EVAL_LOCAL | Run without sending results to Braintrust |
--num-workers <N> | | Worker threads for Python execution |
--verbose | | Show full errors and stderr from eval files |
--list | | List evaluators without running them |
--jsonl | | Output one JSON summary per evaluator (for scripts). See also the global --json flag (overview), which formats all CLI output as JSON rather than per-evaluator summaries. |
--terminate-on-failure | | Stop after the first failing evaluator |
--dev | | Start a local web server for browser-based eval development (default port: 8300) |
Passing arguments to the eval file
Use -- to forward extra arguments to the eval file via process.argv:
bt eval foo.eval.ts -- --description "Prod" --shard 1/4
Running in CI
Set BRAINTRUST_API_KEY instead of using OAuth login:
# GitHub Actions example
- name: Run evals
env:
BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
run: bt eval tests/
Use --no-input and --json for non-interactive output:
BRAINTRUST_API_KEY=... bt eval tests/ --no-input --json