Bolt Pipeline Healer

When a Bolt run fails, Paradime's Self-Healing feature can automatically trigger a DinoAI programmable agent against the failure. This guide walks you through wiring up a bolt-pipeline-healer agent that does more than blindly fix every failure — it deduplicates against prior self-healing attempts and existing open PRs before opening a new one.

The result is an autopilot that opens a fix PR when one is genuinely needed and stays quiet when an earlier attempt is already waiting for review.

compass

Before You Start

Integrations

The following must already be connected in Paradime:

  • Slack — Self-Healing posts and the agent's progress run in the configured slack_channel

Recommended reading

Before proceeding, read the Programmable Agents section under Products → DinoAI:

And the Self-Healing flow:

What You'll Build

By the end of this guide you'll have:

  • A bolt-pipeline-healer DinoAI agent YAML committed under .dinoai/agents/

  • A Bolt schedule with self_healing.enabled = true pointing at this agent

  • A safe-by-default autopilot that only opens a new PR when no prior PR already addresses the same error

What Happens When a Run Fails

Once Self-Healing is enabled on a Bolt schedule:

The dedup check is the heart of this agent. Without it, every failed run of the same broken model would create a new PR — burying the team in duplicate fix attempts.

Architecture Overview

1

Create the Agent YAML

Commit this file at .dinoai/agents/bolt-pipeline-healer.yml:

Tool allowlist is deliberately narrow. The agent has read access (read_file, ripgrep_search, run_sql_query), Bolt observability (list_bolt_schedules, get_bolt_run_logs), PR awareness (list_pull_requests), and a terminal (run_terminal_command) to commit changes and open the PR. No post_slack_message is needed — Self-Healing already threads the agent into a Slack channel, and the agent's stdout shows up there automatically.

2

Enable Self-Healing on the Bolt schedule (UI)

In the Bolt schedule editor for the schedule you want to heal:

  1. Open the Self-Healing section.

  2. Toggle Enable Self-Healing.

  3. Pick the Slack channel the agent should run in — e.g. #agent-demo. It must already be configured under Notification Settings on this schedule.

  4. From the Agent Name dropdown, pick bolt-pipeline-healer. The list is populated from .dinoai/agents/*.yml on the schedule's git branch — so make sure the YAML from Step 1 has been merged before configuring this.

  5. Save the schedule.

See Bolt → Self-Healing for the full UI walkthrough.

3

Or: enable Self-Healing via YAML (schedules-as-code)

If you manage Bolt schedules as code, add the self_healing block to the schedule entry:

See the full schema at Schedules as Code → Configuration Reference → Self-Healing.

4

Watch the first heal in action

The next time the schedule fails:

  1. Open #agent-demo in Slack and find the failure notification for the run.

  2. Inside that thread you'll see:

    • 🦖 Self-healing enabled — starting healing session...

    • A series of agent messages showing the dedup check in action (current logs → prior sessions → open PRs)

    • Either:

      • A "fix already in progress" post linking the existing PR (when the dedup check trips), or

      • A PR link to the freshly opened fix branch (when the failure is new)

  3. Review the PR, merge if you're happy, then retry the Bolt run from the Run Details page.

How the dedup check works

The dedup check is what makes this agent safe to leave running unattended. It's driven by three pieces of context the agent reasons over:

Input
Source
Used to answer

Current error

get_bolt_run_logs(run_id=<failed run>)

What is failing right now?

Prior self-healing attempts

Initial context block prepended automatically by Paradime when this schedule has prior runs

Has the agent already seen this exact failure recently?

Open pull requests

list_pull_requests

Is there already an unmerged fix attempt waiting for review?

The agent stops with a Slack pointer instead of opening a duplicate PR if both conditions hold:

  1. The current error matches an error from a prior self-healing attempt.

  2. An open PR already exists referencing that error or the failing model.

If either condition fails — the error is genuinely new, or the prior PR was already merged / closed — the agent proceeds with the fix.

Why the initial context matters. Paradime injects a "Prior self-healing attempts for this schedule" block into the agent's prompt automatically when prior sessions exist for the same schedule_name_uuid. This is what gives the agent recall across runs without needing its own memory tool.

File Structure

Your repository should look like this after completing the setup:

Last updated

Was this helpful?