Follow this guide when you first pull the repository or whenever you spin up a fresh environment.
Prerequisites
macOS / Linux shell with Python 3.11+ (AutoData ships type hints that assume 3.11).
uv for dependency and virtualenv management.
Playwright system dependencies (Chromium download + lib dependencies).
API keys for at least one LLM provider (OpenAI, Anthropic, Google, or OpenRouter).
Tip
uv installs project dependencies into .venv/. Always run CLI commands through uv run ... so the lockfile is honored.
Clone & Install
# Clone either the public repo or the dev fork
git clone https://github.com/Tianyi-Billy-Ma/AutoData.git
cd AutoData
# Sync dependencies for development, testing, and docs
uv sync --group dev,test,docs
# Download browser binaries and system deps for browser-use
playwright install
playwright install-deps # Linux containers only
Provide Credentials
Copy the example environment file and edit it with your keys.
cp .env.example .env # fill in OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
Export any additional API keys required by plugins or tools (e.g.,
PPLX_API_KEYfor the Perplexity search tool,TIINGO_API_KEYfor the financial plugin).When using OpenRouter (or any OpenAI-compatible proxy), set:
export OPENROUTER_API_KEY="..." export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
AutoData automatically picks up these variables if
llm_config.api_key/base_urlare leftnull.
Run Your First Task
# Validate configuration without running agents
uv run python -m autodata.main --config configs/default.yaml --dry-run
# Kick off a real task (overrides run name at the CLI)
uv run python -m autodata.main \
--config configs/default.yaml \
--run-name aapl-daily-2024 \
--task "Collect daily AAPL candles for 2024"
Common CLI flags:
--visualize-graphsaves a Mermaid rendering of the LangGraph graph.--execution-strategychooses between synchronous (stream,run) and async (astream,arun) execution APIs.--disable-humanskips interactive HumanAgent confirmations.--overwrite/--force-overwritelet you reuse the samerun_namesafely.
Inspect the Outputs
Every run writes to outputs/<run_name>/ (auto-generated if you omit --run-name). Key folders:
Path |
Contents |
|---|---|
|
Frozen configuration with merged CLI overrides so you can reproduce the run. |
|
High-level metadata, tool outputs, and dataset pointers. |
|
Generated Python files, research documents, CSV/JSON dumps, or ZIP archives returned by agents. |
|
Temporary working directory for the Python REPL tool and engineer/test agents. Safe to clean between runs. |
|
Structured logs at the level defined by |
|
OHCache artifacts + metadata files. Useful for debugging context routing. |
|
Playwright/browser-use state, screenshots, and optional video recordings. |
|
Serialized checkpoints when |
Resume or Clean Runs
List checkpoints:
uv run python -m autodata.checkpoint --run-name aapl-daily-2024 list --jsonSave ad-hoc checkpoint:
uv run python -m autodata.checkpoint save --name after-blueprint --stage=researchResume a run: add
checkpoint_config.resume_from: "checkpoint/<file>.bin"to your config or pass--checkpoint.resume_from=<file>on the CLI.Clean up old checkpoints:
uv run python -m autodata.checkpoint clean --max-keep=5 --older-than-days=7
Helpful Developer Commands
Goal |
Command |
|---|---|
Format & lint |
|
Run tests |
|
Debug a single agent |
|
Inspect parity harness |
|
Warning
Runs can make network calls through browser-use and tool APIs. Ensure you comply with each site’s terms of use when providing tasks.
You are ready to tailor the configuration. Continue to LLM Provider Setup for the full schema and environment variable reference.