How to extend your coding harness without losing your shit.

So yes, I have opinions.

No, I haven’t made anything that plays Doom in TUI. No token-burning agent swarm extension. No magical background shit that allegedly saves you tokens by spending tokens with another LLM. Just boring workflow stuff.

Anyway.

Most extensions suck.

Not because they don’t work. Most of them do. That’s not the problem.

They suck because they were built for exactly one person: their developer. And that developer usually thinks their own workflow is obvious.

Many devs have no idea about UI/UX. Some have a bit of DX taste, mostly from using things, not creating them. So you get these control-freak extensions with a dozen /commands. Every command has its own special parameters. You’re supposed to pass X but not Y, unless Z, and if you get it wrong the Clanker shits the bed, does nothing, or confidently does the opposite of what you wanted.

If that sounds like a headache, it’s because it is.

You install an agent-team-ish extension and get handed a README dissertation, as if you’ve just adopted an entire piece of enterprise software. This is why frontier labs and native harnesses have features that “just work”. Because it was not one guy in his bedroom with a brilliant idea and a concerning amount of coffee. It was probably a few people, a whiteboard, dogfooding, and a team of Clankers repeatedly finding every stupid edge case. And shitting their beds over and over again.

Let’s get into the extension sins.

One thing at a time.

An extension should do ONE THING. Not because we should all be based minimalists like a certain Austrian developer.

Coding agents already operate inside enough invisible state and have enough training data to confidently slop their way through whatever task you give them. Your repo, prompts, your holy 200 LoC AGENTS.md files, your 50 skills downloaded from the internet (probably made by someone a lot less skilled in operating AI than yourself), the weird shit the agent inferred three turns ago. Adding another thing to manage is how you invent a second job for yourself and how you inadvertently create more slop. With all the best intentions.

I’ll stick to agent-team-ish extensions because they are both easy and hard at the same time.

You can vibe up a subagent system that works automagically without much user input. Like CC/OC/Codex-style review or exploration flows. Who doesn’t love subagents?

Then there are variations of them that are mostly worse versions of the originals, with /commands that only a superhuman can memorise. Maybe some of them are good. I haven’t seen one that doesn’t feel like micromanaging interns, so I skip them.

What I propose instead is painfully boring: make one extension per useful subagent. Explore, review, summarise, whatever the hell you fancy.

If your explore agent works well and you stop thinking about it, then maybe add another type. If review becomes part of your actual day, keep it. If you only run the fancy orchestrator when you remember it exists, delete the fancy orchestrator.

Only add things when you keep prompting the same damn workflow manually over and over again. My housemate has this principle: “If I don’t want something in a couple of weeks, it means I never needed it in the first place”.

Note that this mostly relates to Pi, since it doesn’t natively support subagents. Which makes it a very good example. Missing native feature. OOOOH. SHINY. LET’S CLANK.

Too many options.

If your extension requires an operating manual, it’s bad. Yes, yes, some things are genuinely complex. Spare me the edge case. Most coding-agent extensions are not complex because the problem is complex. They are complex because the author never subtracted anything.

I know I’m circling back to /commands, but they are the easiest example. I can’t stand having /commands in my coding agent that go beyond the absolute minimum. It causes me physical pain when I see people invoking skills from /commands and proudly shipping 20+ of them in a bundle.

Do yourself a favour and ask how many you actually use.

I currently use three.

/review. Branch-based subagent review, no params, spits out a review summary and appends enough context so the main agent doesn’t instantly run off fixing useless suggestions.
/marker. Drops an in-session marker.
/end. Summarises, goes back to the marker and advances it without me dabbling in Pi’s session tree.

When you’re creating an extension, consider the UX. Can it be a tool for the Clanker? Can it be a programmatic prompt injection through a /command? Does the human need to touch it at all? Is this parameter solving a real problem, or did the model add it because it sounded “flexible”? Cut out 80%. Leave the 20 that matter.

Let’s venture into making music for a second. Loop-based electronic music specifically, because that’s what I’m familiar with.

One way to make a track is to vibe hard. Add layers until it sounds full as fuck. Then you take a break, come back, and start muting things. How much can you subtract while keeping its soul intact? How many random effects and layers did you add that don’t actually carry the message?

Same with extensions.

You tell the Clanker to build a thing. It builds the thing. Then it adds ALL THE RANDOM FEATURES you never asked for. You go “hm, that makes sense” because technically it does. Then you never use that feature once.

If you never use something in your own extension, how does it benefit anyone else? Does it benefit your precious Clankster?

See what can be automated and simplified. Agents are good at inferring what they’re supposed to do when the rails are sane. Give them better instructions. Instead of exposing another parameter to the user, hide the decision inside the agent tool.

Speaking of tools.

The Clanker is confused. It hurt itself in confusion.

One of my AI-savvy mates made me inspect my extension’s token consumption. He said it was using a suspiciously high amount of tokens. I checked.

A full-on reimagining of Pi’s toolkit was consuming 2k tokens per “Hi” instead of Pi’s native 1k. Outrageous, I know. Challenge accepted.

I had a look. It worked, but it was a mess. Doubled-up instructions here and there. Everything too verbose. Lots of friendly little explanations that the Clankers love.

One example was in a tool JSON schema.

"workdir": {
  "type": "string",
  "optional": true,
  "description": "Optional working directory; defaults to current turn cwd"
}

Can you spot the issue?

Classic ignotum per ignotius. Explaining something by using the same phrase. Clankers love that. This is now:

"workdir": {
  "type": "string",
  "optional": true,
  "description": "Defaults to current turn cwd"
}

That’s 5 tokens saved right there. Who cares, right? Right… Now multiply it by 20 lines of similar stuff. Then by every turn. Then by every extension. You see my point?

The same extension had “prompt guidelines” in Pi, which go into the system prompt as bullet points. They further explained how the agent should behave in its altered environment. Except most of them were redundant with the tool schema, so I cut them.

That’s one extension. Minus 500 tokens.

Another one was a semantic grep extension that agents can use for natural-language queries in a codebase. This was the worst offender.

Use early to discover relevant files, behavior, concepts, or docs.

Just randomly injected into the system prompt without enough context or explanation. Poor Clanker had no idea what it was supposed to refer to, so it guessed based on the tool schema and whatever context-pulp it was being fed. Obviously, my bad for not checking, but…

Surprisingly, the extension was fully functional. Probably not thanks to that system prompt line, the idea must have simply been brilliant, heh.

This is now:

semantic_grep: Use early to discover relevant files, behavior, concepts, or docs.

See the diffs below:

And while clanking through these descriptions, Gippity told me: “If we want to keep these human-readable…” Oh, PS. Every goddamn symbol is a token. semantic_grep looks nicer, but the quote marks are super unnecessary.

U WOT MATE. THE ONLY HUMAN READABLE THING IN THIS EXTENSION SHOULD BE THE README.

You can’t just vibe-slop your way through extensions. At least not completely. Know what you’re developing for. Ask the Clanker to put descriptions and schemas into separate, easily readable files so you can inspect them without digging through code - even if it creates more files than a single index.ts. Test the extension in isolation. Say “Hi” to your Clanker. See precisely what it puts into context. Then, in Pi, /share => review.

Developing for terminal doesn’t mean you don’t need UI.

We interact with coding-agent interfaces by looking at them. Mostly. Shocking revelation, I know.

Terminal UI is still UI. It needs a quickly skimmable state and obvious actions. If you create an extension and cannot see its entire interface in one place, it’s bad and you should probably scrap half of it.

I have a skills+workflows extension. TLDR: skills and workflows are close, but workflows are for Clankers to create agentically via a tool. Doesn’t mean I never want to see them. Doesn’t mean they should all become /commands either.

Skills and workflows extension UI

Skills and workflows extension detail

Couldn’t find a workflow sadly, they are basically the same. But I keep my skills global and workflows repo-level, with tiny AGENTS.md files suggesting that the agent should probably read the workflow file if it wants to operate in this folder.

If you’re making something with /commands, ask whether it should be one command and a tiny UI behind it. A list. Tabs. Every selectable option gets a clear description so the user instantly knows what it does without reading the friendly manual.

Non-agentic agentic tools.

One trick nobody seems to use enough: your extension be an agent-invokable CLI.

Just like entire software solutions can be a markdown file these days, an extension doesn’t have to be a fully agentic tool with a giant schema.

At least in Pi’s case, I can expose the same thing I have available in the GUI to the agent. You make the GUI perform an action programmatically, like entering a todo item with description. It gets saved as a markdown file. Then you make a separate action that sends the location of the CLI to the agent as a prompt and appends your input to it, and the Clanker suddenly gets a todo tool and can go BRRRRR.

Yet another thing I made and never used, because todos are soooo 2025.

With that you don’t waste tokens on a massive tool description. You “politely” tell the agent to call a file with “–help”, like some kind of caveman. But it works. Congratulations, you just made a skill+script and bundled it into a GUI that you can use yourself if you want to.

Extensions are yours.

These days it’s easier to vibe up your own extension and refine it as you go than to modify existing ones. Even cloning a repo and modifying it is a waste of time.

Best part of doing your own thing is that you don’t need to talk to the maintainer. You don’t need to file an issue, submit a PR. You just tell the Clanker that you want it changed, and if it works better… Great.

99% of the stuff I see doesn’t quite fit my workflow. And it probably doesn’t fit yours either if you have to put in work to make it flow. I’ll show myself out.

Whatever I put out is an experiment. A work in progress. Something that gets refined on a daily basis if it annoys me. Some extensions get abandoned because they seemed like a good idea at the time. Some were probably worth putting out there in case people had suggestions. Some will die of natural causes, especially with models getting better.

And that’s fine. We’re all learning. That’s what extensions are supposed to do: they should disappear into the workflow.

If I have to remember your command syntax, you failed.
If the Clanker needs a 20k tokens to say “Hi” back, you should rethink your decisions.
If the README is longer than it takes you to clone the repo and ask the agent what the heck does this thing actually do, wipe this vibeslop off your drive.

Build small little tools for yourself. Use them until they annoy you and you need to rework them. Cut the stuff that doesn’t quite work. Polish the turds. Publish them if they survive.

Or don’t. Half of them will be obsolete when the next model drops anyway.

Anyway, back to clanking.

Originally published on: https://x.com/Howaboua/status/2053632905461518783?s=20