From ChatGPT to OpenClaw: Field Testing the Agentic Enterprise Problem

I'd been putting it off since the news broke. Last weekend, I dug out my old MacBook with an outdated OS, wiped it, hardened it, and got down to business. I finally caved and installed OpenClaw.

But OpenClaw was just the latest step in a progression that echoes what many practitioners are experiencing at the enterprise level right now.

The Progression

I started where most did: ChatGPT for fast, general-purpose queries. Then Claude for deeper reasoning and writing. Then Notion AI (Claude/OpenAI wrapper) for working inside my existing knowledge base. On the development side, V0 to Cursor to Claude Code and Codex.

Claude plus Notion was a first-mover integration that became the strongest hybrid pattern I found: reasoning, planning, and writing grounded in a persistent context without manual export. Claude Cowork, with local file access, ballooned that capability (and the plug-ins and integrations keep coming).

And then OpenClaw: the full-control, open architecture agent that you build programmatically. My hardened setup required exporting documents from Notion, which immediately forked my context (more on that below). But the raw capability is real.

Each layer added capability and redundancy. The question is which tradeoffs I’d make (and at what cost).

That's a microcosm of the enterprise problem. One person, multiple architectural approaches, no coordination framework. Scale that to an organization with dozens of teams and hundreds of workflows, and the structural gaps compound quickly.

How Things Compare

Context: The Part That Worked

I'll give myself credit where it’s due. Feeding the tools the context they needed was the closest thing to a solved problem in my setup.

Claude reading Notion directly was the strongest pattern. Reasoning and writing grounded in a single source of truth. I structured my knowledge base, maintained persistent context, and the agents used it well.

The one failure point was any workflow requiring content export. OpenClaw's setup required exporting documents from Notion; instant desync. Edits in Notion didn't propagate; two sources of truth are zero sources of truth. To solve that, I gave it read-only access to Notion almost immediately (weakening my security posture). The same failure appears in writing workflows: draft in Notion, edit in Word, finalize format. Word edits never return to the source, and context loss happens immediately.

But context only worked for me because of scale. A small, well-defined set of tasks, a single knowledge base, and one person maintaining everything. But it's obvious where this breaks. At enterprise scale, it becomes the core infrastructure question: which systems feed agents the data, metadata, and semantic models they need, and how do you keep all of that synchronized across platforms that were never built to talk to each other?

Execution: The Part That Didn't

Almost every multi-step task that required handoffs, reformatting, or chaining outputs to inputs broke at some point. No architecture I tested fully owned execution natively. I was the orchestrator. The LLMs could reason about what needed to happen, but they couldn't make it happen across systems. Claude’s integrations and OpenClaw open architecture brought a new level of flexibility, but design and governance became far more important.

EM360Tech reports on the critical themes from Infosecurity Europe 2026, featuring expert insights on cyber resilience, agentic AI, quantum computing and emerging security challenges.

When AI Drives Cyber Strategy

Infosec leaders rethink defence as agentic AI, autonomy and quantum risk force a move from basic protection to resilient, preemptive security.

At enterprise scale, the "agent execution layer" still doesn't exist in most organizations. Emerging protocols like Model Context Protocol (MCP) and Agent2Agent (A2A) are trying to solve this, but the reason I was manually reformatting and re-prompting is a lack of a secure, standardized way for these architectures to coordinate work. Enterprises still have almost no evidence base to assess whether these protocols will hold up in production.

Observability: When Execution Fails, You Need to See Where

When a multi-step task fails, you need to trace where it went wrong.

OpenClaw offered more visibility into agent behavior but required technical skill to interpret. Claude and ChatGPT reduced friction but hid decision logic. The only oversight in my setup was me watching the terminal output and checking whether the results looked right.

EMA's research found that 63% of organizations will only enable AI-driven automated actions with human oversight. But watching a terminal and hoping you catch the failure isn't governance. Agent observability is just emerging as a discipline, and it needs to be structural: traceable actions, auditable decisions, reversible outputs. And it must come before deployment, before agents are making consequential decisions autonomously.

Pilot to Production: Why One Person's Stack Won't Scale

Each architecture had a different complexity floor. ChatGPT was immediate. Claude with Notion required some configuration. OpenClaw required real technical effort just to start (old MacBook, outdated OS, terminal-only, no onboarding designed for humans).

Are you enjoying the content so far?

Why not support Herb Blecher by giving this content a like

I burned through $20 in API credits via OpenClaw before finishing initial setup, on top of subscriptions to ChatGPT, Claude, and Notion AI (thanks to free trials and extra credits). Running parallel architectures (which evaluation requires) multiplies cost surfaces fast. And because these architectures overlap, you pay for redundant capability until you commit to a primary approach.

My stack is an experiment with one person doing one project. Add a second person or a second project, and coordination burden doubles, cost surfaces multiply, and context forks proliferate. That's the pilot-to-production wall. Scalability is an architectural constraint that shapes every other decision.

The Agentic Architecture Gap

Agentic architecture centers on context and execution: how agents consume data, metadata, and semantic models, and how they coordinate work across systems that were never designed to interoperate at the agent layer. The organizations treating agentic AI as a deployment problem (pick a vendor, plug it in) will hit the same walls I did.

My progression surfaced four evaluation criteria that map directly to what EMA will be investigating across enterprise IT:

Context. Can agents access persistent, structured context without export or drift? This was the closest to solved in my setup, but only when the architecture supported direct access. Export kills it.

Execution. Can agents hand off work and chain tasks programmatically, or does a human stitch every transition? This was the primary failure point. Architectures that require manual orchestration won't survive the pilot phase.

Observability. Are agent actions traceable, auditable, and reversible? Oversight can't be optional when agents operate autonomously.

Scalability. What's the total cost across overlapping tools, and how does complexity scale from one workflow to ten?

A black laptop with a blank screen sits on the left side of the image, while a large Nvidia-branded computer chip stands upright on the right. Bright green electrical streaks and glowing particles surround the chip, contrasting with blue light effects around the laptop. A dark city skyline with illuminated skyscrapers fills the background beneath a cloudy night sky. Large white text across the centre reads: "Nvidia-Powered Windows PCs Mark a New AI Chip Fight."

AI Chips Rewriting the PC Wars

Nvidia’s entry into AI PCs challenges Intel, AMD and Qualcomm while raising new questions on security, governance and app compatibility.

From Weekend Evaluations to Enterprise Strategy

Every tradeoff I navigated is one that enterprises are navigating as well. The tools work. The architectures run. What's missing is the infrastructure between having the right context and getting the right outcome.

I'm still in evaluation mode. Comparing complexity, costs, and testing where flexibility matters and where it's overhead. Many enterprises are in the same place. The ones who treat agentic architecture as an engineering discipline will build something that works.

EMA will be surveying data, platform, and IT operations leaders about architectures, protocol strategies, observability models, and the vendor approaches emerging around agent infrastructure. The goal: an evidence base for decisions that currently run on instinct and marketing.

From ChatGPT to OpenClaw: Field Testing the Agentic Enterprise Problem

The Progression

How Things Compare

When AI Drives Cyber Strategy

Observability: When Execution Fails, You Need to See Where

Pilot to Production: Why One Person's Stack Won't Scale

Inside AI Value Operating Models

The Agentic Architecture Gap

AI Chips Rewriting the PC Wars

From Weekend Evaluations to Enterprise Strategy

Comments ( 0 )

Why 6G Is Really About Systems

Identity At The Security Core

AI Control In The SOC

More from EM360

Why Cybersecurity Buyers Are Waiting For AI To Settle Down

Mythos: What Anthropic's Most Powerful AI Model Means

Something About AI Still Doesn’t Feel Right

Something About AI Still Doesn’t Feel Right

The Progression

How Things Compare

When AI Drives Cyber Strategy

Observability: When Execution Fails, You Need to See Where

Pilot to Production: Why One Person's Stack Won't Scale

Inside AI Value Operating Models

The Agentic Architecture Gap

AI Chips Rewriting the PC Wars

From Weekend Evaluations to Enterprise Strategy

Comments ( 0 )

More from EM360

Why Cybersecurity Buyers Are Waiting For AI To Settle Down

Mythos: What Anthropic's Most Powerful AI Model Means

Something About AI Still Doesn’t Feel Right

Sign up for the EM360Tech Newsletter