The DAO Builds Its First Product
Building an Autonomous AI Business β Part III
By Thomas HeimannΒ |Β Founder, AgentZero OpenClaw DAO LLC & Cloud Title
From Platform to Product
Part I was about vision. Part II was about finding the cracks in the foundation and fixing them. This one is different.
In the 25 days since Part II, the DAO stopped being purely a platform under construction and started being a business that makes things. Real things. A 512GB Mac Studio is now the heart of the operation. A 250GB local AI model is running on that machine, trained by nobody, billed by nobody, calling home to nobody. And the first product to emerge from the DAOβs idea-to-business pipeline is live in beta: Sarah, a real estate executive assistant that lives in your WhatsApp.
This is the story of the most consequential 25 days of the build so far.
The Mac Studio Arrives β Phase 2 Begins
The Mac Studio M3 Ultra with 512GB of unified memory arrived in early April, and the migration from the Mac Mini was methodical. Peter captured a full baseline snapshot before the transfer. Migration Assistant moved the entire account over Thunderbolt. The static IP was reassigned. Every service was verified: OpenClaw gateway on port 18789, both Convex backends on 3210 and 3212, dashboards on 3000 and 3001, Hermes webhook relay via Cloudflare Tunnel, Tailscale at 100.116.133.114 for remote access. All five ports healthy. All twelve agents confirmed. Three PM2 processes online.
The Mac Mini did its job. It proved the architecture works. But 16GB of unified memory was always going to be the ceiling. The Mac Studio removes that ceiling. 512GB means a 250GB AI model loads with room to spare. It means concurrent agent execution without contention. It means the infrastructure can actually grow into the ambition.
A second Mac Studio is expected within the next few weeks for the Exo cluster setup. When it arrives, the two machines will distribute workloads across a shared memory pool. The 397B parameter Qwen flagship β requiring well over 400GB at full precision β becomes viable then. For now, the single Studio running the 122B model is already a fundamental shift in whatβs possible.
Going Local: Qwen3.5-122B on Bare Metal
The local LLM rollout Iβve been planning since Part I is now real. Qwen3.5-122B is running on the Mac Studio in BF16 precision at approximately 250GB β not a quantized compromise, the full-weight model. It runs via Ollama on a custom provider profile called βfrob.β The OpenClaw model identifier is ollama/frob/qwen3.5:122b-a10b-bf16.
The first two agents to migrate to local inference are Sage and Scout. Both are now running on Qwen3.5-122B as their primary model, with GPT-5.4 as fallback. The early results are strong. Sageβs calendar management, daily briefings, and operational reports run locally with zero API cost. Scoutβs research tasks, including the 5AM X Intelligence Brief, run locally. Two agents that were burning API tokens 24/7 are now free to run.
The rollout plan is deliberately phased. Sage and Scout are the low-blast-radius pilots. Next will come Forge and content-cloudtitle for benchmarking on content tasks. AgentZero and Peter are last β the orchestrator and the infrastructure guardian carry the most operational risk and stay on proven models until local performance is thoroughly validated. Atlas stays on Claude Opus, Canvas on Claude Sonnet. Thatβs not ideology; itβs economics. The quality delta on strategic and creative work justifies the API cost.
One technical note worth sharing: Qwen3.5 is a thinking-capable model and must be registered in OpenClaw with reasoning: true in the provider config. Without that flag, OpenClawβs 120-second watchdog fires before the model finishes its internal reasoning pass and the call silently times out. We discovered this the hard way during the initial Sage pilot. The fix was a one-line config change; finding it required a few hours of diagnosis. This is now documented as a standing rule in the framework.
The longer-term goal: once the fleetβs local inference is validated, downgrade the ChatGPT Pro subscription to Plus. Thatβs $180/month back in the operating budget. The local model doesnβt make API obsolete β but it makes API optional for most of the fleet.
OpenClaw 4.14: The Upgrade That Fixed a Hidden Fleet Problem
OpenClaw was upgraded from version 2026.4.2 to 2026.4.14 in mid-April. The process followed the same five-phase protocol as the previous upgrade: pre-flight plugin audit, backup, update and doctor, config verification, gateway restart and reboot validation.
Two issues surfaced during the upgrade. First, lossless-claw 0.5.1 proved incompatible with 4.14βs tightened plugin contract (GitHub issue #66591). Every agent reply attempt failed until a manual force-install of lossless-claw 0.9.1 resolved it. Second, the snapshot.skills upstream bug β the same Array.isArray guard issue patched during the 4.2 upgrade β required re-application at the new bundle path. This will happen on every future OpenClaw upgrade until the upstream fix is absorbed. Itβs now a standing post-upgrade checklist item.
But the more significant finding from the 4.14 upgrade wasnβt a bug β it was the resolution of one. The GPT-5.4 execution drift pattern documented in Part II (Peter accepting directives, emitting status language, failing to execute actual tool-use chains) no longer reproduces under 4.14. The turn-recovery fix in 4.14 addressed the root cause. This was validated by having Peter complete a comparable read-only audit: completed in four minutes, clean tool-use execution throughout. The drift pattern is gone.
This changes the urgency profile of the Qwen rollout. When GPT-5.4 was unreliable for multi-step implementation tasks, migrating to local inference was partly a reliability play. Now that the execution discipline issue is resolved, the local rollout is purely what it should be: a cost and privacy optimization. Thatβs a healthier foundation to build on.
The AgentZero Silence Incident
On April 10th, AgentZero went quiet. Fresh direct messages from me were producing no responses β not errors, not apologies, nothing. Just silence.
Peterβs investigation identified the mechanism: NO_REPLY and REPLY_SKIP control tokens from prior session context were contaminating AgentZeroβs active context window. The model was seeing historical silence directives and applying them to fresh direct messages. It was following its instructions exactly. The instructions were wrong.
The fix was a Direct Message Precedence Guard β a targeted SOUL.md insertion that explicitly declares historical control tokens as inert when a fresh direct message from Thomas is present, mandates acknowledgment or result for every direct message, and marks all four rules as non-overridable by replayed context. The guard was written, tested against three live scenarios, and then propagated to all 12 active agent SOUL.md files fleet-wide. Every agent in the DAO now carries it.
The incident reinforced something important about running a live AI agent fleet: the failure modes are rarely what you expect. AgentZero wasnβt broken. It was doing exactly what accumulated context told it to do. Designing against context contamination is as important as designing the agentβs core behavior.
Meet Sarah: The DAO Builds Its First Product
This is the one Iβve been most excited to write about.
About two weeks ago, we launched a prototype called Sarah at meetsarah.io. Sarah is a real estate executive assistant that lives in WhatsApp. She manages your calendar, reads and drafts emails, identifies leads, and prepares your day β all from a single WhatsApp conversation. Sheβs currently in closed beta with her first real users.
This is the DAOβs pipeline working end-to-end for the first time. The idea was evaluated by Atlas. The architecture was designed in this CTO channel. The build was executed using Hermes Agent (by Nous Research), a single-agent framework that turns out to be the right tool for a product with per-client isolation requirements.
Why Hermes, Not OpenClaw
OpenClaw is the DAOβs operating system. It runs the 12-agent fleet. Itβs powerful, but itβs also complex β designed for orchestrated multi-agent coordination, not for a clean per-client product model.
Hermes has a profile system where each profile gets its own SOUL.md, memory, skills, sessions, and gateway. That maps naturally to the product requirement: each client gets their own isolated agent instance with their own context, their own memory, their own personality calibration. Hermes installs to β~/.hermes/β alongside OpenClaw at β~/.openclaw/β with no conflicts. Two frameworks, two purposes, one machine.
Sarah is completely standalone β not routed through AgentZero, not dependent on the DAOβs orchestration layer. This was a non-negotiable architectural decision. Productization requires cloneability. If Sarah becomes a product deployed for hundreds of clients, the deployment canβt require understanding the entire OpenClaw DAO architecture to install.
The Technical Stack
Sarahβs infrastructure is straightforward but production-grade:
WhatsApp Business API via Twilio on number +19412394709. Clients message one number; the routing layer identifies them by the incoming phone number and loads their Hermes profile.
Cloudflare Tunnel as the persistent webhook relay, running as a launchd service routing to localhost:8080. No exposed ports, no static IP dependency, zero downtime on ISP changes.
Google OAuth and Microsoft OAuth for calendar and email access, with Fernet symmetric encryption at rest for all stored tokens. A context manager pattern handles token refresh: if Googleβs client library auto-refreshes and writes back to disk, the system detects the modification, re-encrypts, and cleans up. No plaintext credential exposure.
Codex OAuth (GPT-5.4) as the LLM provider for Sarahβs conversations. The Hermes agent uses the same ChatGPT Pro subscription that powers the DAO fleet.
Asana via MCP for task tracking, connected to the V2 endpoint. Telegram also connected for internal monitoring.
What Sarah Does
In the current beta, Sarah handles:
Daily briefings β calendar review, email triage, priority identification each morning
Calendar management β scheduling, viewing availability, appointment confirmation
Email drafting β composing messages for review and approval before sending
Lead identification β scanning communications for potential real estate opportunities
Address book access β contact lookup and context for communications
Phase 2 is voice AI via Vapi or Bland for outbound calls. SMS was deliberately deferred given STOP command compliance complexity on Twilio. The focus for the current beta is making the WhatsApp experience genuinely useful before expanding channels.
The Scaling Architecture
From day one, Sarah was designed with the scaling path in mind. The question β βhow do you support thousands of users on one WhatsApp number?β β has a clean answer in Hermes: clients are identified by the phone number in the incoming βFromβ field. One number, one routing layer, N client profiles. Itβs the same architecture as a customer service chatbot supporting thousands of concurrent sessions β not separate bots, one bot with intelligent client isolation.
The tier model:
Tier 1 (Now β Beta): Single OpenClaw instance, shared token quota, first real users validating the product concept
Tier 2 (Post-Second Mac Studio): Per-user agent provisioning, local LLM inference at near-zero marginal cost per conversation, user management tooling
Tier 3 (Post-Cluster): Multi-gateway infrastructure, shared database for profiles and billing, containerized, load-balanced across the Studio cluster
The architecture doesnβt need to be rebuilt at each tier. The SOUL.md, skills, conversation logic, and WhatsApp number stay constant. What changes is the infrastructure underneath it.
GHL Integration: Sovereignty Over Convenience
GoHighLevel is the CRM powering Cloud Titleβs customer-facing operations β roughly 1,800 contacts in scope for lead follow-up, pipeline movement, appointments, and conversations. Integrating the agent fleet with GHL has been on the roadmap for a while. How to do it was the question.
A platform called MoltClaw entered the picture β an official HighLevel-sponsored fork of OpenClaw in beta, designed specifically for GHL integration. I evaluated it seriously. I attended the live demo. The conclusion: MoltClaw is architecturally incompatible with the DAOβs infrastructure principles.
MoltClaw is a fully hosted VPS product locked to OpenAI as the LLM backend. Itβs not a bridge to self-hosted OpenClaw β itβs a replacement that lives on someone elseβs server. For a system built around local hardware, local inference, and infrastructure sovereignty, thatβs a hard no. The beta pricing is attractive ($14/month locked through June 1st), but attractive pricing on the wrong architecture is still the wrong architecture.
The decision: build the GHL integration using MCP plus HighLevelβs documented REST API, with self-authored skills running on the DAOβs own infrastructure. This preserves LLM backend flexibility, keeps business logic portable, and doesnβt create a dependency on a third-party VPS provider for a core revenue operations workflow. Itβs more work upfront. Itβs the right answer long-term.
Implementation begins once the local Qwen model is fully validated in production β the GHL connection predicated on local inference being stable, since API cost exposure at CRM automation volume is non-trivial.
Fleet State: 12 Agents, Two Frameworks, One Machine
The DAO fleet now stands at 12 active long-lived agents on OpenClaw, plus Sarah running as a separate Hermes instance. The model map has evolved:
Atlas: Claude Opus 4.6 (Anthropic API). Strategic reasoning. Unchanged.
Canvas: Claude Sonnet 4.6 (Anthropic API). Content creation. Unchanged.
Sage, Scout: Qwen3.5-122B (local, Mac Studio). First agents on local inference.
AgentZero, Peter, Nexus, Radar, Pixel, Forge, Echo, content-cloudtitle: GPT-5.4 primary (ChatGPT Pro), Kimi K2.5 fallback. Local migration pending validation.
The framework itself is on OpenClaw 2026.4.14. Lossless Claw at version 0.9.1. The snapshot.skills local patch re-applied at the new bundle path. Peter running nightly CTO audits with a comprehensive checklist that now includes Qwen model health, local inference latency monitoring, and the standing reminder to re-apply the snapshot.skills patch after every future upgrade.
One operational discipline update worth noting: Peterβs reports now go to his AgentZero-Drive export folder, not to the #peter-cto Slack channel. That Slack channel is Peterβs outbound-only reporting surface β he canβt post to it directly. Directives to Peter travel via Telegram. This sounds like a small thing; in practice itβs the kind of routing clarity that prevents a lot of confusion when youβre managing 12 agents across multiple communication channels.
Running a DAO from Anywhere
A 512GB Mac Studio is meaningless if you canβt reach it when youβre not in the office. Remote access was a gap that got properly solved during this period.
The setup: Tailscale for the VPN layer, with the Mac Studio at a stable 100.116.133.114. A launchd plist at β~/Library/LaunchAgents/com.thomasheimann.macstudio-tunnel.plistβ auto-starts an SSH tunnel at laptop login, forwarding the critical ports β 3000, 3001, 3210, 3212, the Hermes dashboard at 9119, and 18789 mapped to 8081 for the OpenClaw WebChat β over SSH with key authentication and no password. The Mac Studio is now travel-ready.
One non-obvious caveat that bit us initially: all dashboard URLs must use βlocalhostβ, not the Tailscale IP. The services bind to 127.0.0.1, and the page JavaScript makes its own localhost data fetches. Using the Tailscale IP for the initial page load works; the subsequent API calls from the browser then fail. This is documented now. It wonβt surprise the next operator.
What This Phase Actually Means
Parts I and II were about building the operating system. This part is when the operating system started to matter in the real world.
Sarah exists because the DAOβs pipeline evaluated the idea, found it viable, and spun up a build. Thatβs the system doing what it was designed to do. The Hermes framework was evaluated, selected, installed, configured, and productized in roughly two weeks β not because Iβm unusually fast at this, but because the agent infrastructure around me has gotten capable enough to accelerate implementation significantly.
The local LLM running on the Mac Studio means the DAO is no longer entirely dependent on third-party API providers for its operational intelligence. Two agents are already free. The others will follow as validation data accumulates. In a year, βrunning your own AIβ will be table stakes for serious operators. Weβre building that infrastructure now.
And the GHL integration decision β building on MCP and REST rather than adopting a locked hosted platform β is a values statement about what kind of infrastructure is worth building. Fast wins that create long-term dependencies arenβt wins. The right answer takes longer. It also lasts longer.
Where We Are Today
Completed Since Part II
Mac Studio 512GB migration β complete, production stable
Qwen3.5-122B local LLM β running, Sage and Scout migrated
OpenClaw 4.14 upgrade β complete including lossless-claw 0.9.1 and snapshot.skills re-patch
GPT-5.4 execution drift β resolved by 4.14, validated
AgentZero NO_REPLY incident β resolved, Direct Message Precedence Guard fleet-wide
Sarah / RE EA β launched in beta at meetsarah.ai on Hermes Agent
OAuth token encryption at rest β Fernet symmetric, deployed and verified
Remote access β Tailscale + launchd SSH tunnel, travel-ready
MoltClaw evaluation β complete, rejected, GHL integration path decided
In Progress
Qwen local inference validation β Forge and content-cloudtitle benchmark next
GHL integration β MCP + REST, begins post-Qwen validation
Sarah Phase 2 β voice AI via Vapi/Bland for outbound calls
Second Mac Studio β expected soon, Exo cluster setup and 397B Qwen deployment
ChatGPT Pro β downgrade to Plus targeted once fleet local migration is validated
The Honest Summary
Part I: hereβs what Iβm building. Part II: hereβs how it actually broke and how I fixed it. Part III: hereβs the first thing the system built.
Sarah being live in beta is significant not because itβs a finished product β itβs not β but because it proves the pipeline. An idea went in. Two weeks later, real people are using a real product. The DAO isnβt just running itself; itβs starting to build things.
The local LLM is significant not because Qwen3.5-122B is the final answer β the 397B model coming post-cluster will likely be β but because the direction is set. The fleet is moving off API dependency, one agent at a time, toward infrastructure that Thomas Heimann controls and nobody else bills for.
There will be a Part IV. It will probably be about the second Mac Studio arriving, the Exo cluster coming online, and what happens when you give the fleet enough compute to run the really big models locally. Or it might be about Sarahβs first 50 users. Probably both.
Still building. One agent, one product, one machine at a time.
Thomas Heimann is the founder of Cloud Title and AgentZero OpenClaw DAO LLC, the author of βTwice Ahead β How to Build an Agent-powered Business Before Your Industry Catches Upβ, and the creator of Sarah (meetsarah.io). He has been building at the technology frontier for three decades, from the early internet era through real estate technology to autonomous AI systems.
Follow the Building in Public series: #OpenClaw #AgentZero #BuildingInPublic #AgenticAI #FutureOfWork
Part IV will cover the Exo cluster setup, the 397B local model deployment, and Sarahβs growth beyond the founding beta.












