Deploying to Akash with a Multi-Agent Team and Private Container Registries

A walkthrough of deploying a containerised application to the Akash Network using a Hermes multi-agent system powered by AkashML AI models. Covers the full lifecycle from Dockerfile to a live deployment on decentralised infrastructure.

Deploying to Akash with a Multi Agent Team and Private Container Registries

Summary

A walkthrough of deploying a containerised application to the Akash Network using a Hermes multi agent system powered by AkashML AI models. Covers the full lifecycle from Dockerfile to a live deployment on decentralised infrastructure, including the deploy akash skill and the private container registry pattern.

What Is Akash?

Akash is an open source decentralised cloud computing marketplace. Unlike traditional cloud providers that run on centralised data centres, Akash aggregates idle compute capacity from a global network of providers. Workloads are deployed via a peer to peer marketplace where bids compete on price, giving users access to compute at a fraction of AWS, GCP, or Azure rates.

Key properties that make Akash unique:

  • Permissionless: Anyone with a wallet can deploy. No accounts, no credit cards, no support tickets.
  • Censorship-resistant: Your workload runs on independent providers around the world, not inside a single corporate data centre.
  • Cost competitive: Competitive bidding drives prices down. You specify your maximum price; providers undercut each other to win the lease.
  • Kubernetes native: Deployments use the SDL (Stack Definition Language), which compiles down to Kubernetes manifests. If you can write a Docker Compose file, you can write an SDL.
  • Community governed: Protocol parameters, software upgrades, and economic policy are decided by AKT token holders.

Akash operates as a live decentralised compute marketplace with active providers and production workloads. Akash represents one approach to decentralised infrastructure and marketplace driven compute.

The Problem

You have a containerised application , a static site, an API service, a dashboard , and you want it running on Akash. You also want it built in CI, pushed to a private registry, and deployed automatically. Sounds straightforward, right?

This is exactly where a multi agent collaboration system proves its value. Instead of one person grinding through these interconnected issues, specialised agents work in parallel , each tackling their domain , while a coordinator validates the integrated result.

The Architecture: Hermes with AkashML Models

The system uses three agent roles, all powered by AkashML AI models:

  • Coordinator Agent: Authors content, assigns tasks, validates results, and manages the overall workflow.
  • Implementation Worker: Handles Dockerfile, CI pipeline configuration, and deployment scripts. Writes code that compiles and runs.
  • Review Worker: Critiques architecture, flags security risks, identifies weak assumptions, and validates operational concerns.

These agents operate from a shared project workspace , a durable filesystem where artifacts are committed, versioned, and shared. Agent outputs are not ephemeral chat messages; they are files in a Git repository.

The deploy akash Skill

The workflow described here is codified into a reusable Hermes skill: deploy akash. A skill is a procedural playbook that agents consult when they encounter a recurring task. It is not black box automation , it is a documented, versioned, and reviewed by peers guide that any agent can follow, inspect, or improve.

The deploy akash skill encodes the following safety rules and workflow:

  • Managed wallet API only: Agents authenticate with Akash via the Akash Console REST API using a single API key header (x-api-key). No self custody keys, no mnemonics, no local keyring files. This keeps the deployment flow stateless and safe to run inside CI.
  • Private registry prerequisite: Before any deployment, the worker must verify that a deployable container image exists. If no public image is available, the pipeline must provide registry credentials via CI environment variables , never committed to the repository.
  • SDL templating: The SDL file in the repository uses angle-bracket placeholders (<REGISTRY_USER>, <BUILD_VERSION>) that are substituted at deploy time by a shell script. This separation of template from runtime data keeps secrets out of version control.
  • Bid polling with timeout: The deployment script polls the Akash marketplace for bids, selects the cheapest provider, and creates a lease automatically. If no bids arrive within a configurable window, the deployment fails fast rather than hanging.
  • Teardown instructions: Every deployment skill must document how to close the lease. On Akash, open leases continue to charge the escrow deposit until closed or exhausted.
  • Error catalogue: Documented failure modes include “zero bids” (pricing too low, signedBy too restrictive), “invalid manifest” (SDL schema error), “pull failure” (registry credentials missing or incorrect), and “timeout” (provider did not start within the polling window).

When an agent is tasked with an Akash deployment, it loads this skill, follows the checklist, and produces a reproducible result. If a new failure mode is discovered, the skill is patched and all future agents inherit the fix.

Step 1: The Container Image

The Implementation Worker starts by writing a minimal Dockerfile:

FROM nginx:alpine
COPY index.html /usr/share/nginx/html/index.html
EXPOSE 80

Simple, correct, secure. The Review Worker confirms nginx:alpine is a reasonable base and that no secrets are baked into the image.

The Dockerfile is committed to the repository along with a CI pipeline that builds and pushes the image on every commit. The pipeline uses Docker in Docker and tags the image with the Git commit SHA for reproducibility, avoiding the trap of mutable :latest tags in production deployments.

A generic CI pipeline for this looks like:

stages:
  - build
  - deploy

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: ""
  REGISTRY: your-registry.example.com
  IMAGE_NAME: your-app

build:
  stage: build
  image: docker:24-dind
  services:
    - docker:24-dind
  script:
    - docker login -u "$REGISTRY_USER" -p "$REGISTRY_PASSWORD" "$REGISTRY"
    - docker build -t "$REGISTRY/$IMAGE_NAME:$CI_COMMIT_SHA" .
    - docker push "$REGISTRY/$IMAGE_NAME:$CI_COMMIT_SHA"

deploy:
  stage: deploy
  image: alpine/curl:latest
  script:
    - ./deploy.sh

This is vendor agnostic Docker in Docker. It works in GitLab CI, GitHub Actions with DinD runners, Drone, Woodpecker, or any CI system that supports service sidecars and Docker CLI access.

Step 2: The SDL and the First Failure

The Review Worker authors the initial Akash SDL (Stack Definition Language) manifest:

---
version: "2.0"

services:
  web:
    image: your-registry.example.com/your-app:latest
    credentials:
      host: your-registry.example.com
      username: "<REGISTRY_USER>"
      password: "<REGISTRY_PASS>"
    expose:
      - port: 80
        as: 80
        to:
          - global: true

profiles:
  compute:
    web:
      resources:
        cpu:
          units: 0.1
        memory:
          size: 128Mi
        storage:
          - size: 512Mi

  placement:
    akash:
      signedBy:
        anyOf:
          - akash1365yvmc4s7awdyj3n2sav7xfx76adc6dnmlx63
          - akash18qa2a2ltfyvkyj0zgj6vzp3gdejwkpqwcwcwrr
      pricing:
        web:
          denom: uakt
          amount: 1000

deployment:
  web:
    akash:
      profile: web
      count: 1

The credentials use angle-bracket placeholders (<REGISTRY_USER>, <REGISTRY_PASS>). At deploy time, a script substitutes these with values from CI/CD secrets. This keeps credentials out of the Git history.

The deploy.sh script templates the SDL and submits it to Akash:

#!/usr/bin/env bash
set -euo pipefail

# Template the SDL — substitute placeholders with CI secrets
SDL=$(cat deploy.yml | sed \
  -e "s|<REGISTRY_USER>|${REGISTRY_USER}|g" \
  -e "s|<REGISTRY_PASS>|${REGISTRY_PASS}|g" \
  -e "s|:latest|:${BUILD_VERSION}|g")

# Create deployment
RESPONSE=$(curl -fsS -X POST https://console-api.akash.network/v1/deployments \
  -H "x-api-key: ${AKASH_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"data\":{\"sdl\":$(echo "$SDL" | jq -Rs .),\"deposit\":5.5}}")

DSEQ=$(echo "$RESPONSE" | jq -r '.data.deployment_id.dseq')
echo "Deployment created: DSEQ=$DSEQ"

# Poll for bids
echo "Waiting for bids..."
for i in $(seq 1 30); do
  BIDS=$(curl -fsS "https://console-api.akash.network/v1/bids/$DSEQ" \
    -H "x-api-key: ${AKASH_API_KEY}")
  COUNT=$(echo "$BIDS" | jq '.data | length')
  if [ "$COUNT" -gt 0 ]; then
    echo "Bids received: $COUNT"
    break
  fi
  sleep 3
done

# Select cheapest provider
PROVIDER=$(echo "$BIDS" | jq -r '.data | sort_by(.bid.price.amount) | .[0].bid.id.provider')
echo "Selected provider: $PROVIDER"

# Create lease
MANIFEST=$(echo "$RESPONSE" | jq -r '.data.manifest')
curl -fsS -X POST https://console-api.akash.network/v1/leases \
  -H "x-api-key: ${AKASH_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"manifest\":$MANIFEST,\"leases\":[{\"dseq\":\"$DSEQ\",\"gseq\":1,\"oseq\":1,\"provider\":\"$PROVIDER\"}]}"

echo "Lease created. Polling for endpoint..."

Note the explicit set -euo pipefail at the top. Without it, a failed curl in a pipeline would silently continue, and you would be none the wiser that your deployment never happened.

Following the same API flow, we create the deployment, poll for bids, and wait.

The first attempt fails. No bids arrive. After 30 retries, the deployment script exits and the pipeline errors out.

Step 3: Diagnosis and Fixing the SDL

The Coordinator Agent investigates. The SDL has two problems:

Problem 1: signedBy Constraints

The original SDL included a signedBy block with two specific provider addresses baked in. Only those two providers were authorised to bid. Neither was online or interested at the time of deployment. The result: zero bids.

Including signedBy with a populated anyOf list restricts the deployment to an explicit allowlist of providers. If none of those providers participate, the deployment starves for bids indefinitely.

Fix: Remove the signedBy block entirely. Omitting the constraint opens bidding to all active providers in the Akash marketplace.

Problem 2: Pricing Too Low

The original pricing was set to 1000 uakt. At the time of deployment, the market had shifted and this price point was no longer competitive. Providers were ignoring the deployment in favour of higher paying workloads.

Fix: Bump pricing to 10000 uakt. On the next attempt, ten bids arrive within seconds. The cheapest provider offers 0.74 uakt/hr.

These are not bugs you find in a compiler or a linter. They are market dynamics and configuration assumptions , exactly the kind of operational problem where a Review Worker’s critique and a Coordinator’s systems thinking shine.

Step 4: Creating the Lease and Finding the Endpoint

With bids available, the script selects the cheapest provider and creates a lease. The Console API responds with the lease status, including the ingress endpoint assigned by the provider:

{
  "data": {
    "leases": [{
      "status": {
        "services": {
          "web": {
            "uris": [
              "qhesrugdbhc2j1kko6g8fiv3go.ingress.provider.com"
            ],
            "ready_replicas": 1
          }
        }
      }
    }]
  }
}

The Review Worker warned that providers need time to pull the image , especially from a private registry that requires authentication negotiation. We poll the deployment status every 5 seconds:

# Poll deployment status for ready replicas
echo "Checking service readiness..."
for i in $(seq 1 24); do
  STATUS=$(curl -fsS "https://console-api.akash.network/v1/deployments/$DSEQ" \
    -H "x-api-key: ${AKASH_API_KEY}")
  READY=$(echo "$STATUS" | jq '.data.leases[0].status.services.web.ready_replicas')
  echo "Attempt $i: ready_replicas=$READY"
  if [ "$READY" != "0" ] && [ "$READY" != "null" ]; then
    echo "Deployment ready!"
    break
  fi
  sleep 5
done

# Extract and health-check the endpoint
URI=$(echo "$STATUS" | jq -r '.data.leases[0].status.services.web.uris[0]')
echo "Endpoint: http://$URI"
curl -fsS --max-time 10 "http://$URI"

After roughly 30 seconds the service reports ready_replicas: 1 and a health check against the public endpoint returns HTTP 200. Your container is now live on Akash , deployed on decentralised infrastructure and exposed to the internet.

Step 5: The Private Registry Pattern

The complete flow looks like this:

  1. CI builds the image using Docker in Docker, tags with the Git commit SHA, and pushes to the private registry.
  2. Deployment script substitutes the SDL template variables at runtime:
    • <REGISTRY_USER> → CI secret
    • <REGISTRY_PASS> → CI secret
    • :latest tag → the pinned SHA tag for reproducibility
  3. Akash providers authenticate to the registry using the injected credentials and pull the image.
  4. Container starts on the provider’s infrastructure. The Akash ingress system exposes it on a public endpoint.

The SDL carries authentication details inline, but at no point are those credentials written to the Git repository. They enter the system only at deployment time, via CI/CD environment variables passed to the deployment script.

Step 6: Tear Down

Akash deployments are charged continuously against the escrow deposit. When you want to stop paying, close the deployment:

# Close deployment — refunds remaining deposit
curl -X DELETE "https://console-api.akash.network/v1/deployments/$DSEQ" \
  -H "x-api-key: ${AKASH_API_KEY}"

The Console API returns the remaining escrow balance and the deployment transitions to closed state. If you forget this step, the provider continues billing your deposit until it is exhausted and the lease is closed automatically.

Key Decisions and Tradeoffs

  • Pin tags with Git commit SHA
    Rationale: reproducible deployments where the Akash SDL points to an exact image.
    Tradeoff: the SDL or deployment template must be updated for every release.

  • Remove signedBy constraints
    Rationale: allow all active providers to bid.
    Tradeoff: less direct control over the provider trust model.

  • Use a competitive uakt price
    Rationale: attract bids under current market conditions.
    Tradeoff: higher cost than the theoretical minimum.

  • Pass registry credentials through the SDL at deployment time
    Rationale: required for provider side private image pulls.
    Tradeoff: credentials can leak if logs are too verbose, so scripts should avoid shell tracing and verbose curl output.

  • Use the managed wallet API with x-api-key
    Rationale: avoids self custody key management inside CI.
    Tradeoff: depends on Akash Console API availability.

The Review Worker flagged each of these tradeoffs. The Coordinator Agent made the final calls. The Implementation Worker ensured the scripts executed correctly.

What This Pattern enables

Using a multi agent system for deployment does not just speed up the work. It changes the nature of the debugging process:

  • Parallel investigation: While the Implementation Worker was rewriting the CI pipeline to handle the Docker tag problem, the Review Worker was analysing SDL semantics and price dynamics.
  • Persistent artifacts: Every agent output is a file in a Git repo. The deployment script, the SDL manifest, and the CI configuration all have version history.
  • Cross-domain critique: The Review Worker spotted that a populated signedBy block with unavailable providers is a silent failure mode , not a syntax error, but a semantic one that blocks all bidding. A single developer might have stared at the YAML for hours before realising the market dynamics issue.
  • Standardised workflow: The deploy akash skill captures the complete workflow , prerequisites, API calls, error modes, and teardown , so future agents inherit proven deployment logic.

Closing

Akash provides an alternative deployment model for teams interested in decentralised infrastructure, marketplace based compute allocation, and reduced dependence on traditional hyperscalers.

Deploying to a decentralised compute marketplace is not just about writing the right YAML. It is about understanding marketplace dynamics, authentication flows, container image lifecycles, and operational monitoring , all at once.

A multi agent system with specialised workers makes this tractable. The Implementation Worker handles the build pipeline. The Review Worker validates assumptions and catches silent failures. The Coordinator Agent integrates the pieces and manages the deployment lifecycle.

Using AkashML backed Hermes agents, the workflow can move from repository to deployment quickly while maintaining review and operational validation steps.


Write a comment
No comments yet.