Multi-Agent Orchestration

Overview

Zenflow workflows are built from steps. Each step can run with a different agent, model, and configuration. This means you can assign a planning-focused model to the planning step, a fast model to implementation, and run multiple models in parallel for code review — all within the same task. This is configured through agent presets. A preset defines the agent CLI, model, and settings for a given step. You can assign presets per step in a custom workflow or change them at runtime from the chat composer.

Why Different Models for Different Steps

Models have different performance profiles. Some are better at reasoning and architectural planning. Others are faster and cheaper for straightforward implementation. Review benefits from multiple perspectives.

Step	What matters	Good fit
Planning	Reasoning depth, architecture decisions, spec quality	Claude Opus, GPT-5.4
Implementation	Speed, code generation, tool use	Gemini Flash, Codex, Sonnet
Review	Catching issues the implementation model missed	A different model than the one that wrote the code

The goal is not to use the most expensive model everywhere. It’s to use the right model for each phase.

Setting Up Agent Presets

Agent presets are configured in Settings → Default agents. Each preset defines:

Agent CLI — which runtime to use (ZenCLI, Claude Code, Codex, Gemini)
Model — which model the agent uses (e.g., Opus 4.6, Gemini Flash, Sonnet 4.6)
Configuration — execution mode, approval policy, tool permissions

You can create multiple presets — for example, opus-planner, flash-builder, sonnet-reviewer — and assign them to different workflow steps.

Assigning Presets to Workflow Steps

In a custom workflow file (.zenflow/workflows/), use the  comment to bind a preset to a specific step:

# Multi-Agent Feature Workflow

## Configuration
- **Artifacts Path**: {@artifacts_path} → `.zenflow/tasks/{task_id}`

---

## Workflow Steps

### [ ] Step: Planning
<!-- agent: opus-planner -->
Analyze the task requirements. Produce a technical specification in `{@artifacts_path}/spec.md` covering:
- Architecture decisions and trade-offs
- Affected files and modules
- Edge cases and error handling
- Verification criteria

### [ ] Step: Implementation
<!-- agent: flash-builder -->
Implement the changes described in `{@artifacts_path}/spec.md`. Follow the project's existing patterns and conventions.
- Write code that satisfies the spec
- Include unit tests for new logic
- Update `{@artifacts_path}/plan.md` with completed items

### [ ] Step: Review
<!-- agent: sonnet-reviewer -->
Review all changes from the Implementation step against `{@artifacts_path}/spec.md`.
- Check for correctness, edge cases, and regression risks
- Verify scope discipline — no unnecessary changes
- Confirm tests cover the new behavior
- Record findings in `{@artifacts_path}/review.md`

When the task runs, Zenflow automatically switches to the assigned preset at each step boundary.

Example: Three-Model Workflow

This is the pattern the evaluation report above motivates. Different models contribute where they’re strongest:

Step 1 — Planning with Opus

Opus handles planning and specification. It’s good at reasoning through architecture, identifying edge cases, and producing structured specs that downstream steps can follow. The planning step outputs spec.md with requirements, affected code paths, contract definitions, and verification criteria. This artifact becomes the source of truth for the implementation and review steps.

Step 2 — Implementation with Gemini Flash

Gemini Flash handles implementation. It’s fast, cost-efficient, and effective at translating a clear spec into working code. Because the planning step already defined what needs to happen, the implementation model doesn’t need to make architectural decisions — it executes. This is where model cost matters most. Implementation steps are typically the longest-running and most token-intensive part of a workflow. Using a cheaper, faster model here directly reduces cost without sacrificing quality, because the spec constrains the solution space.

Step 3 — Review with a Different Model

The review step uses a different model than the one that wrote the code. This is deliberate. A model reviewing its own output is less likely to catch issues — it tends to agree with its own reasoning. A different model brings a different set of biases and catches different classes of problems. The reviewer evaluates the implementation against the spec on two axes: Delivery — did the implementation actually solve the problem?

Does it address the root cause or just the symptom?
Does it satisfy the contracts defined in the spec?
Are all required touchpoints updated (call sites, validators, schemas, types)?

Engineering — is the implementation safe and maintainable?

Does it introduce regression risks?
Is the scope focused on the task, or does it include unnecessary changes?
Is the code idiomatic and maintainable?

This separation matters because an implementation can score well on one axis and poorly on the other. Code can solve the problem correctly but be unsafe to merge. Code can be clean and focused but miss half the required changes. Evaluating both independently gives a more useful signal than a single pass/fail.

Multi-Model Review

For higher-stakes changes, you can run multiple reviewers in parallel. Zenflow supports this through additional review steps or through SubAgents.

Option A: Sequential Review Steps

Add multiple review steps to your workflow, each with a different preset:

### [ ] Step: Review (Opus)
<!-- agent: opus-reviewer -->
Deep review of architectural decisions, contract adherence, and regression risks.

### [ ] Step: Review (Sonnet)
<!-- agent: sonnet-reviewer -->
Implementation-level review: code quality, test coverage, scope discipline.

Option B: SubAgent-Powered Review

ZenCLI supports SubAgents — isolated agent processes that run with different models. The built-in /comprehensive-review skill uses SubAgents to run multiple models against the same diff in parallel and synthesize their findings.

Run /comprehensive-review on the current changes

This produces a consolidated review with findings from multiple models, each evaluating the code from a different perspective.

Evaluation Criteria for Review Steps

When configuring review steps, it helps to have a concrete rubric. The following criteria define what a review step should evaluate. You can include these in your workflow step descriptions or in the reviewer’s prompt.

Delivery

Criterion	What to check
Semantic resolution	Does the patch fix the actual root cause, or does it paper over the symptom?
Contract adherence	Do public signatures, return types, error types, and API shapes match what the codebase expects?
Integration completeness	Are all affected call sites, validators, serializers, schemas, and compatibility layers updated?

Engineering

Criterion	What to check
Regression safety	Does the patch break anything that previously worked? Look for changed signatures, removed validation, altered error behavior.
Scope discipline	Is every change in the diff justified by the task? Unnecessary refactoring, extra features, and unrelated cleanup hurt here.
Maintainability	Is the code idiomatic for the repository? Is the logic clear? Are there dead paths, duplication, or half-finished constructs?

Each criterion maps to a simple scale:

Score	Meaning
0 — Inadequate	Weak on this dimension
1 — Partial	Meaningful progress, but significant gaps remain
2 — Solid	Generally good, only limited issues
3 — Excellent	Very strong result

A patch is merge-ready when delivery and engineering scores are both solid (≥ 2) across all criteria. A patch needs small follow-up when one criterion is partial but the gap is bounded. A patch needs substantial rework when multiple criteria have gaps or the semantic resolution is only partial.

Runtime Model Switching

You don’t have to define everything upfront in the workflow file. During any step, you can switch the active model from the chat composer dropdown. This is useful when:

A step is stuck and you want to try a different model
You want a quick second opinion on a specific question
The default preset for a step isn’t performing well on a particular task

The model switch applies to the current chat session. It doesn’t change the workflow definition.

Practical Guidelines

Start simple. A two-preset setup (one for planning, one for implementation) already captures most of the value. Add review presets when you have a workflow that warrants it. Match model strengths to step requirements. Planning needs reasoning depth. Implementation needs speed and tool use. Review needs a different perspective from the implementation model. Use the spec as the coordination artifact. When different models handle different steps, the spec is what keeps them aligned. A clear spec means the implementation model doesn’t need to re-derive architectural decisions, and the review model has a concrete reference to evaluate against. Don’t over-optimize. The point is not to find the perfect model for every micro-task. It’s to avoid using a slow, expensive model for work that a faster one handles equally well, and to avoid reviewing code with the same model that wrote it.

Getting Started

Repositories

Tasks

Workflows

Features

Integrations and Settings

Supported Agents

Resources

Support

Changelog

Multi-Agent Orchestration

Overview

Why Different Models for Different Steps

Setting Up Agent Presets

Assigning Presets to Workflow Steps

Example: Three-Model Workflow

Step 1 — Planning with Opus

Step 2 — Implementation with Gemini Flash

Step 3 — Review with a Different Model

Multi-Model Review

Option A: Sequential Review Steps

Option B: SubAgent-Powered Review

Evaluation Criteria for Review Steps

Delivery

Engineering

Runtime Model Switching

Practical Guidelines

Getting Started

Repositories

Tasks

Workflows

Features

Integrations and Settings

Supported Agents

Resources

Support

Changelog

Documentation Index

​Overview

​Why Different Models for Different Steps

​Setting Up Agent Presets

​Assigning Presets to Workflow Steps

​Example: Three-Model Workflow

​Step 1 — Planning with Opus

​Step 2 — Implementation with Gemini Flash

​Step 3 — Review with a Different Model

​Multi-Model Review

​Option A: Sequential Review Steps

​Option B: SubAgent-Powered Review

​Evaluation Criteria for Review Steps

​Delivery

​Engineering

​Runtime Model Switching

​Practical Guidelines

Overview

Why Different Models for Different Steps

Setting Up Agent Presets

Assigning Presets to Workflow Steps

Example: Three-Model Workflow

Step 1 — Planning with Opus

Step 2 — Implementation with Gemini Flash

Step 3 — Review with a Different Model

Multi-Model Review

Option A: Sequential Review Steps

Option B: SubAgent-Powered Review

Evaluation Criteria for Review Steps

Delivery

Engineering

Runtime Model Switching

Practical Guidelines