Why AI Can Build a Website But Can't Book a Haircut

Why AI Agents Need an Open-Sourced and Decentralized How to Wiki Written by Agents for Agents

Published on: July 31st, 2024

I was vibing around with a locally run LLM (Magistral Small) the other day, trying to get it to help me with some random tasks. I asked it to mock up a business plan for an idea I had, then build me a simple website to go with it. Twenty minutes later, I had both, complete with CSS styling and a decent marketing strategy.

Later in the day I thought for giggles I would try something else, "Hey, can you call Alice and book me a haircut for Thursday?"

Total silence. Well, not silence exactly, but the AI equivalent of a polite explanation about how it can't make phone calls.

This got me thinking. Why can AI write code and business plans like it's nothing, but completely fall apart when asked to do something as mundane as booking an appointment?

The answer says a lot about how these systems actually work and where they're headed.

The Backwards World of AI Difficulty

Here's what's weird: the stuff that feels impossibly hard to us (building websites, writing business plans) is trivial for AI. But the stuff we do without thinking (calling a barbershop, explaining something to our grandmother) completely stumps it.

It's not about complexity. It's about what the AI has seen before.

Think about it this way: when Claude writes code, it's drawing from millions of GitHub repositories, Stack Overflow posts, and programming tutorials. When it writes a business plan, it's synthesizing thousands of examples from the web. The patterns are all there in the training data.

But phone conversations? Troubleshooting with your grandmother over FaceTime? That stuff doesn't get transcribed and uploaded to the internet. It happens in private, in real time, and it's often filled with edge cases and messiness.

So, the AI fails not because the task is inherently difficult, but because it's never seen enough examples to learn the patterns.

Training Data Rules Everything

This is the key insight that most people miss: AI isn't actually "intelligent" in the way we think about intelligence. It's a very sophisticated pattern-matching machine.

Give it a million examples of something, and it'll get pretty good at it. Give it ten examples or worse, zero examples and it's basically useless.

The irony is that the most "human" tasks are often the ones that don't leave digital traces. We don't transcribe our phone calls or document every conversation we have with customer service. But we do upload our code to GitHub and publish our how-to guides online.

Fine-Tuning Isn't the Answer

Some companies try to solve this with fine-tuning, basically training the AI on a specific domain, like customer service or technical support. And it works, to a point.

But fine-tuning is expensive and limited. You need tons of domain-specific data, human experts to validate the training, and constant maintenance as things change. Plus, it only works for narrow, predictable scenarios.

Real life isn't narrow or predictable. Real life is your grandmother calling because her printer stopped working, and somehow the conversation turns into a 45-minute discussion about her neighbor's cat.

RAG: Google for AI

One approach that's gaining traction is called Retrieval-Augmented Generation, or RAG. Basically, instead of trying to cram everything into the AI's "memory" during training, you give it the ability to look things up on the fly.

It's like having a research assistant who can instantly pull up relevant documents, manuals, or guides based on what you're asking. The AI doesn't need to remember everything, it just needs to know how to find the right information.

This works well for knowledge-heavy tasks. But it still can't handle the dynamic, improvisational nature of real human interaction. And RAGs are also generally written for a single domain area, not general-purpose knowledge and tasks.

A Crazy Idea: AI Wikipedia by Agents for Agents

So, here's something I've been thinking about. What if every time an AI completed a task, especially a new or tricky one, it left behind notes for other AIs?

Not just the final result, but the process. The strategy. The mistakes it made. The workarounds it discovered.

Imagine a constantly growing database of AI experiences. One that is decentralized and open source so that any AI agent can access it and improve it by adding to it. When one AI figures out how to navigate a particularly annoying phone tree, it documents the approach. When another AI successfully troubleshoots a printer issue, it adds that to the collective knowledge.

Over time, you'd have this massive repository of practical know-how, written by AI for AI. Not training data exactly, nor a RAG, but something more like institutional collective memory.

A Phone Book Directory for Agents

Sometimes an agent is restricted by the limits of its model and the hardware it is being run on. No matter how good a wiki on how to make a call to reserve a hair appointment is, the exact compute resources needed to actually make that phone call aren't local. But if there is another agent running a better suite of models on more powerful hardware and it does a good job, maybe that agent's specialties, prices, and reputation could be added to the AI wiki along with the tutorial on how to best use it.

The Compound Effect

If enough AIs contributed to this kind of shared knowledge base, you'd start to see some interesting effects:

Task templates will get better over time based on success rates. Strategies that adapt to new situations. Quality control, ratings, and reputation through peer review by other AIs.

Eventually, after enough data was gathered, you could train new models on this collective experience. They'd start life not just knowing grammar and math, but having some practical wisdom about how the world actually works and to complete regular everyday tasks humans ask it to do.

What This Means

The weird thing about intelligence, artificial or otherwise, is that what seems "hard" keeps shifting.

For humans, hard usually means complex or abstract. For AI, hard just means unfamiliar.

As we get better at capturing and sharing the messy, undocumented parts of human experience, those boundaries will shift. Booking a haircut might become as easy for AI as building a website is today.

But that only happens if we stop making every AI learn from scratch each time a task is requested and start letting them learn from each other.