Skip to main content

Command Palette

Search for a command to run...

Why I Failed to Build a Lego-Style Coding Agent

Updated
6 min read
Why I Failed to Build a Lego-Style Coding Agent

I wanted it simple. I made it simple. Then I discovered that making it actually useful meant adding feature after feature. What started as building blocks became an entire castle.

The Beginning: A Simple Idea

On November 30, 2025, I made my first commit: amcp agent init. In the README, I described it like this:

A Lego-style coding agent CLI with built-in tools (grep, read files, bash execution) and MCP server integration for extended capabilities. Lego-style—that was my north star. I envisioned a coding agent that worked like Lego bricks:

  • Minimal core: Just grep, read_file, and bash—the essentials

  • Composable: Extend capabilities through the MCP protocol

  • Lightweight: Only 2,482 lines of Python

  • Few dependencies: Just typer, rich, pydantic, mcp, and openai And I succeeded. The initial AMCP was a clean, focused CLI tool:

src/amcp/
├── agent.py       # 620 lines - Main agent loop
├── tools.py       # 511 lines - Tool definitions
├── chat.py        # 579 lines - Conversation handling
├── cli.py         # 265 lines - CLI entry point
├── config.py      # 169 lines - Configuration loading
├── mcp_client.py  # 102 lines - MCP integration
└── readfile.py    #  47 lines - File reading

Simple. Beautiful. Complete. Or so I thought.

Turning Point #1: The Context Window Explodes

Two weeks after launch (December 14, 2025), I hit my first real problem: context window overflow. When an agent works on complex tasks, conversation history grows indefinitely. My initial solution was brutal—just keep the last 20 messages. But that meant the agent would "forget" critical context from earlier in the session. I had no choice but to add compaction.py (+155 lines):

# 467d72b: feat: add context compaction
class Compactor:
    """Intelligently compress conversation history while preserving key information."""

This was the first "mandatory brick." Without it, the agent couldn't complete complex, multi-step tasks.

Turning Point #2: Not Everyone Uses OpenAI

Two days later (December 16, 2025), reality knocked again: not everyone uses OpenAI.

a9455d5: feat: add support for ACP
fb6b08c: add anthropic and open_response LLM format support

This added 2,709 lines of code:

  • acp_agent.py: 752 lines (Agent Client Protocol support)

  • Anthropic Claude integration

  • OpenAI Responses API format support What started as a simple openai.chat.completions.create() call was now a multi-headed abstraction layer. Different APIs, different formats, different quirks—all needing adaptation.

Turning Point #3: One Agent Isn't Enough

On Christmas Day 2025, I realized that complex tasks need multiple agents working together:

e61bba1: add multiple agents
5e58fc3: add TaskTool and EventBus

This added 2,246 lines of code:

  • multi_agent.py: 375 lines

  • message_queue.py: 531 lines

  • event_bus.py (eventually grew to 635 lines)

  • task.py (now 859 lines) The jump from single-agent to multi-agent was a qualitative shift. My Lego bricks were becoming a castle.

Turning Point #4: Users Need Extensibility

Between December 28, 2025 and January 2, 2026, I added two extensibility systems:

d746a8c: feat: add Hooks system for extensible agent behavior (v0.5.0)
17eeeb3: feat: add skills and commands system for agent extensibility

The Hooks system (+886 lines) lets users inject custom logic before and after tool execution. The Skills and Commands system (+836 lines) enables dynamic capability loading. I initially thought these were "nice to have." But when I started actually using AMCP for real work:

  • Dangerous operations needed validation → PreToolUse hooks

  • Everyone on the team had their own workflows → Commands

  • Different projects needed different expertise → Skills Features I thought were optional turned out to be essential.

Turning Point #5: Production Needs a Server

January 7-9, 2026 brought the biggest refactor:

7f0ac35: feat: init C/S architecture
52c93c2: feat(server): complete Phase 2 - streaming & events
af7ce69: feat: complete Phase 3 - CLI Client SDK
f923591: feat: protocol and sessions

This added 3,266+ lines of code:

  • HTTP/WebSocket server

  • Session management

  • Event broadcasting system

  • Multi-client support

  • Protocol adaptation layer Why? Because:

  • IDE integration requires persistent connections

  • Multiple clients need shared sessions

  • Real-time streaming requires WebSocket

  • Enterprise deployment requires a service architecture

The Numbers Tell the Story

Here's a before-and-after comparison:

Metricv0.1.0 (Initial)v0.8.0 (Current)
Lines of Code2,48220,176 (8x growth)
Python Files853 (6.6x growth)
Dependencies714+ (2x growth)
Directory Depth1 level4 levels (added server/, client/, protocol/, prompts/)
Development Time40 days
Commits158

Growth Timeline

2025-11-30  ████                           Initial version (2.5K lines)
2025-12-14  █████                          + Context Compaction
2025-12-16  ████████                       + ACP + Multi-LLM Support
2025-12-25  ████████████                   + Multi-Agent + EventBus
2025-12-28  ██████████████                 + Hooks System
2026-01-02  ████████████████               + Skills & Commands
2026-01-07  ████████████████████           + C/S Architecture
2026-01-09  █████████████████████████      Current version (20K+ lines)

Why "Simple" Doesn't Last

Looking back at 40 days of development, I've identified four reasons why simplicity was impossible to maintain:

1. Reality Is More Complex Than Your Imagination

I initially thought read_file + grep + bash would be enough. Reality disagreed:

  • Large files need chunked reading → smart readfile modes

  • Large-scale edits are error-prone → apply_patch tool

  • Complex refactoring needs planning → todo tool

  • Dangerous operations need confirmation → permission system

2. User Needs Are Incremental

At first, I was the only user. Then others started using it:

  • "Can you support Claude?" → Multi-LLM support

  • "Can I use this in Zed?" → ACP protocol

  • "Can multiple agents collaborate?" → Multi-Agent architecture

  • "Can I customize workflows?" → Skills & Commands

  • "Can I deploy this as a service?" → C/S architecture Every request was reasonable. Every feature became necessary.

3. Production Requires Robustness

Toys can be simple. Production systems must:

  • Handle edge cases

  • Manage resource lifecycles

  • Support concurrent access

  • Provide monitoring and debugging

  • Guarantee type safety These "non-functional requirements" often require more code than the features themselves.

4. Composability Requires Infrastructure

Here's the irony: to achieve true "Lego-style" composability, you need:

  • A unified tool interface → BaseTool abstraction

  • A message passing mechanism → EventBus

  • Lifecycle hooks → Hooks system

  • Dynamic loading → Skills system

  • Configuration management → Complex config layer The infrastructure for composability is itself a source of complexity.

What I Learned

1. "Lego-Style" Is a Philosophy, Not a Destination

Lego bricks look simple. But Lego the company has thousands of different parts, strict quality standards, and a sophisticated design system behind those "simple" blocks. AMCP is still "Lego-style"—its design still prioritizes composability. But achieving composability requires significant complexity.

2. Complexity Is Necessary, But Must Be Managed

The codebase grew 8x, but if you look closely, complexity is distributed:

  • Core agent logic remains relatively simple

  • Complexity is encapsulated in individual modules

  • Modules communicate through clean interfaces Complexity is inevitable, but it can be isolated.

3. Incremental Evolution Is the Right Path

If I had tried to design a 20,000-line system from day one, I would have:

  • Built features nobody actually needed

  • Optimized the wrong things too early

  • Lost the ability to iterate quickly By starting simple and solving only "the most painful problem right now," AMCP evolved into something actually useful.

Conclusion: Did I Really "Fail"?

So, did I actually fail? If the goal was to stay at 2,500 lines of code—yes, I failed spectacularly. But if the goal was to build a coding agent that actually works in production—then I succeeded. The price of success was accepting that complexity is unavoidable. "Lego-style" isn't about simplicity. It's about the right abstractions, clear boundaries, and composable design. AMCP still has those.

Written on January 10, 2026AMCP v0.8.0 | 58 commits | 20,176 lines of code