Skip to main content
Run evaluation files against Braintrust. Supports JavaScript and Python.
bt eval is currently macOS and Linux only. Windows support is planned.

File selection

  • bt eval — discover and run all eval files in the current directory (recursive)
  • bt eval tests/ — discover eval files under a specific directory
  • bt eval "tests/**/*.eval.ts" — glob pattern
  • bt eval a.eval.ts b.eval.ts — one or more explicit files
Files inside node_modules, .venv, venv, site-packages, dist-packages, and __pycache__ are excluded from automatic discovery. Explicit paths and globs bypass these exclusions.

JavaScript runners

By default, bt eval auto-detects a runner from your project (tsx, vite-node, ts-node, then ts-node-esm). Set one explicitly with --runner / BT_EVAL_RUNNER:
bt eval --runner vite-node tutorial.eval.ts
bt eval --runner tsx tutorial.eval.ts
bt eval automatically resolves locally installed binaries from node_modules/.bin, so you can write --runner tsx instead of --runner ./node_modules/.bin/tsx (for example). If you see ESM or top-level await errors, try --runner vite-node.

Python

bt eval also runs Python eval files. Use --language py to force language detection. By default, if VIRTUAL_ENV is set, bt uses that virtualenv’s Python; otherwise it searches PATH for python3 or python. To use a specific interpreter, set BT_EVAL_PYTHON_RUNNER to its name or path (e.g. python3.11). The --num-workers flag controls concurrency for Python execution.
bt eval my_eval.py
bt eval --language py --num-workers 4 my_eval.py

Flags

FlagEnv varDescription
--runner <RUNNER>BT_EVAL_RUNNERRunner binary (tsx, bun, ts-node, python, etc.)
--language <LANG>BT_EVAL_LANGUAGEForce language: js or py
--filter <PATTERN>BT_EVAL_FILTERRun only evaluators matching the pattern
--watch / -wBT_EVAL_WATCHRe-run when input files change
--no-send-logsBT_EVAL_LOCALRun without sending results to Braintrust
--num-workers <N>Worker threads for Python execution
--verboseShow full errors and stderr from eval files
--listList evaluators without running them
--jsonlOutput one JSON summary per evaluator (for scripts). See also the global --json flag (overview), which formats all CLI output as JSON rather than per-evaluator summaries.
--terminate-on-failureStop after the first failing evaluator
--devStart a local web server for browser-based eval development (default port: 8300)

Passing arguments to the eval file

Use -- to forward extra arguments to the eval file via process.argv:
bt eval foo.eval.ts -- --description "Prod" --shard 1/4

Running in CI

Set BRAINTRUST_API_KEY instead of using OAuth login:
# GitHub Actions example
- name: Run evals
  env:
    BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
  run: bt eval tests/
Use --no-input and --json for non-interactive output:
BRAINTRUST_API_KEY=... bt eval tests/ --no-input --json