AutoData is driven by a single AutoDataConfig object that can be loaded from YAML, TOML, or JSON. This page describes the important sections, how CLI overrides behave, and which environment variables you should set beforehand.

Tip

All configuration files live under configs/. Copy configs/default.yaml, rename it, and commit the copy for task-specific presets.

LLM Provider Setup

AutoData supports multiple LLM backends through LangChain’s init_chat_model. Keep llm_config.api_key and llm_config.base_url as null to let AutoData infer settings from environment variables, or hardcode them per run.

OpenAI

export OPENAI_API_KEY="your-openai-key"

Anthropic

export ANTHROPIC_API_KEY="your-anthropic-key"

Google

export GOOGLE_API_KEY="your-google-key"

OpenRouter (or any OpenAI-compatible proxy)

export OPENROUTER_API_KEY="your-openrouter-key"
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

When these variables are present, AutoData auto-populates llm_config.api_key/base_url. This applies to other OpenAI-compatible services as well.

Using `.env` Files for Secrets

AutoData loads environment variables from .env, .env.local, or ~/.autodata/.env before parsing your configuration. Copy the example template, fill it once, and skip repeated export commands:

cp .env.example .env
echo "OPENAI_API_KEY=sk-demo" >> .env

You can still override individual values at runtime with export or CLI flags, but keeping your API keys in the .env file ensures uv run python -m autodata.main ... always finds them, including inside automation scripts and CI runners.

File Formats & Loading

# Pick the config you want to run
uv run python -m autodata.main --config configs/finance.yaml

# Override detection if the extension is missing
uv run python -m autodata.main --config configs/generated --config-format yaml

YAML is the default, but JSON and TOML work the same way—just pass the extension or use --config-format whenever the filename is ambiguous:

# JSON
uv run python -m autodata.main --config configs/portfolio.json

# TOML (extension omitted, format forced via flag)
uv run python -m autodata.main --config configs/research --config-format toml

Internally AutoDataConfig.from_file() parses the file, validates every field via Pydantic, and expands relative paths to absolute ones. CLI arguments then override the loaded structure via apply_cli_overrides. The priority order is: CLI > config file > defaults.

Core Fields

Field	Description
`run_name`	Defaults to `default_run`. Provide a custom value when you want a unique folder name (it is only required when reusing directories with `--overwrite`).
`task`	Free-form instruction that the Supervisor reads when executing the graph. CLI flag `--task` temporarily overrides it.
`task_timeout`	Max run duration in seconds (default `3600`).
`disable_human`	When `true`, AutoData auto-confirms HumanAgent prompts and remains non-interactive.
`execution_strategy`	Choose `stream`, `run`, `astream`, or `arun` to control synchronous vs async LangGraph execution.
`enabled_plugins` / `plugin_config`	List of plugin modules under `autodata.plugins` the graph should import (`financial`, `sport`, `academic`, …). Each plugin can inject prompts or LangChain tools.

Storage & Logging

storage_config:
  type: "file"
  output_dir: "./outputs"
  file_format: "json"
  compression: null
  database_url: null
  overwrite: true
  force_overwrite: true

output_dir defines where AutoData writes runs. Paths are resolved relative to the repo unless you specify an absolute path.
overwrite and force_overwrite default to true, so repeated runs with the same run_name overwrite earlier artifacts with no prompt. Set them to false if you prefer manual protection.
run_dir, cache_dir, work_dir, and video_dir are derived from run_name; you rarely need to set them manually.

Logging is configured through log_config:

log_config:
  log_level: "INFO"   # override with --log-level
  log_file: null       # optional path relative to run_dir/logs
  metrics_port: 9090   # reserved for Prometheus exporters

Language Model Settings

The llm_config object mirrors LangChain’s init_chat_model arguments. Example:

llm_config:
  model: "gpt-4o-mini"
  model_provider: null      # auto-inferred when omitted
  temperature: 0.0
  base_url: null            # point to OpenRouter or self-hosted proxy
  api_key: null             # auto-detect OPENAI_/ANTHROPIC_/OPENROUTER_... variables
  configurable_fields: null # e.g. "any" or ["temperature", "model"]

If you want to override the base URL or key inside the configuration file, set llm_config.api_key/base_url explicitly. Otherwise leave them null and rely on exported environment variables.

Tooling & Plugins

tool_config:
  work_dir: null                # falls back to outputs/<run_name>/work
  cache_dir: null               # inherits from config.cache_dir
  PerplexitySearchToolModel: "sonar"

Set PPLX_API_KEY to let ToolAgent call the Perplexity search tool.
When plugins are listed in enabled_plugins, AutoData loads their PluginSpec definitions, applies prompt injections per agent, and binds additional LangChain tools.

OHCache Hypergraph

ohcache_config:
  enable_ohcache: true
  cache_dir: null        # defaults to outputs/<run_name>/cache
  auto_cleanup: false
  hyperedges:
    - id: research
      source: [PlanAgent]
      target: [SupervisorAgent, HumanAgent, BrowserAgent]
      message_type: "plan"

Hyperedges define which agents share messages. Sources are singleton sets (an agent emits a message) and targets can include any number of recipients.
Message types (message_type) act like channels—Blueprint updates can be isolated from Browser observations.
When enable_ohcache is false the system reverts to naïve context passing. Leave it enabled for best token usage.
Cached artifacts persist on disk per entry (cache/meta/*.json + cache/artifacts/*). Set auto_cleanup: true to prune expired entries on startup.

Checkpoints

checkpoint_config:
  checkpoint_enabled: true
  auto_checkpoint: true      # save snapshots automatically after agents run
  checkpoint_dir: null       # defaults to outputs/<run_name>/checkpoint
  export_json: false         # when true, writes human-readable JSON next to binaries
  resume_from: null
  max_checkpoints: null

You can override any of these via CLI using dot notation, e.g. --checkpoint.resume_from=checkpoint/manual.bin. Use the dedicated CLI for inspection:

uv run python -m autodata.checkpoint list --run-name aapl-run
uv run python -m autodata.checkpoint load checkpoint/manual.bin --json

Browser & Agent Controls

AutoData splits browser-use settings into two models. You can provide a legacy flat browser_config block (as shown in configs/default.yaml) or the explicit nested structure:

browser_use_browser_config:
  headless: true
  disable_security: false
  user_agent: null
  args: []               # Chromium arguments
  record_video_dir: null # leave null to write under outputs/<run>/browser

browser_use_agent_config:
  max_steps: 20
  max_actions_per_step: 50
  llm_timeout: 120
  generate_gif: false
  file_system_path: null

setup_output_directory() ensures browser_use_agent_config.file_system_path points to outputs/<run_name>/browser so browser-use can persist histories between agent calls.

CLI Reference

See Supported Arguments for every configuration field, its default, and the generated CLI flag. The table below highlights the switches teams override most often during day-to-day runs.

Flag	Effect
`--config-path PATH`	Load an alternative config file (default `configs/default.yaml`).
`--config-format {yaml,json,toml}`	Force parsing format if the extension is missing.
`--run-name NAME`	Override `run_name`/`storage_config.run_name`. Validated for alphanumeric/`-_`.
`--task "..."`	Override the task description for this invocation only.
`--model`, `--temperature`	Override LLM settings.
`--output-dir PATH`	Redirect the entire run folder (useful for shared storage).
`--log-level LEVEL`	Set logging verbosity without editing the config file.
`--visualize-graph`	Persist graph diagrams under the run directory.
`--dry-run`	Stop after validation and directory creation.
`--disable-human`	Mirrors the config flag for a non-interactive run.
`--overwrite`, `--force-overwrite`	Control existing run directory handling.
`--checkpoint.*`	Use dot-notation to override any checkpoint setting from the CLI.

Always keep your configuration files in version control when possible so the generated summary.json plus config.yaml in outputs/<run_name>/ allow you to reconstruct a run precisely.