Assessing the Latest AI Coding Hotness

January 29, 2026

“Have you tried out Ralph Wiggum loop?!”

“Are you still using beads?”

“Is Gas Town actually legit?”

“Know anyone who’s into BMAD?”

Most days these days I’m grateful my family doesn’t know what my Serious Work Conversations sound like. It’s kinda embarrassing.

Let’s talk about how to deal with all this noise. You’re staring at a breathless social media post (probably with a really cringe AI-generated image of a robot “writing software”) or Slack share talking about the latest AI-coding flavor of the week. You’re wondering “is this actually worth me checking out?”

Here’s the heuristic I use: focus on whether it helps address the LLM’s fundamental limitations by asking:

does this new approach systematically improve how the agent manages context?
does this new approach systematically improve how the agent gets human feedback?

Read on and I’ll elaborate…

The problem

Keeping track of the AI coding landscape is legitimately exhausting.

The LLM models are improving fast. Beyond the models themselves, a ton of smart people are rapidly improving the coding agents. Beyond THAT an even larger ton of smart people are experimenting with better ways to use these agents, coming up with libraries, frameworks, and methodologies at a wild pace (and yes most of them have sorta dumb names).

I spend a good amount of my time eye-deep in this stuff and I still have a hard time keeping up.

What we need is…. a META-FRAMEWORK!

Clearly what’s needed is a systematic way to assess these systems!

I’m sort of kidding, but I have been thinking a lot about how to sort the wheat from the chaff when it comes to this torrent of hyped AI-coding practices.

I know that some of these frameworks feel off to me - I’m skeptical they’ll have a significant impact - but I’ve struggled to articulate why until recently.

I’m starting to realize that it comes down to a simple question: does this new thing address either of the big limitations of LLMs - context and taste.

Context and taste

After helping explain AI coding techniques to hundreds of engineers - over 800 so far! - what’s crystallized for me is that when a coding agent fails to produce a good solution the reason boils down to one of two things:

it has bad context - it’s missing useful information, or distracted by irrelevant information.
it has mediocre taste - just as with other creative output like art or poetry, an LLM doesn’t have a great sense of design when it comes to writing code.

Without human feedback, LLMs tend to write software like an early-career engineer

Without human feedback, LLMs tend to write software like an early-career engineer - code that solves the immediate problem but tends to accumulate “design debt” over time, leading to software that’s becomes brittle and buggy.

Avoiding this trap and succeeding with coding agents is all about solving these two problems of bad context and meh design. Each problem has a corresponding solution:

We address bad context with better context management
We address the LLM’s poor design taste by setting up strong feedback loops from human engineers (who have a better design sense)

That’s really the crux of this - if I see an AI coding framework or methodology that provides some sort of solution in either or both of these categories then I consider it to be worth checking out. On the other hand if I hear of a framework that doesn’t seem to be addressing either of these then I’m happy to let other people play with it. I’ll check back in a couple of months and see if it’s still gaining attention.

Examples and counter-examples

Let’s look at some examples and counter-examples.

Beads helps in breaking a problem down into smaller tasks which can be handed off to a subagent with a focused context window. Seems like that’s going to help with context management - probably worth a look.

Ralph Wiggum loop is centered around the idea of giving a high-level prompt to an agent and having it repeatedly attack the problem, restarting with a new context window each time. I don’t really see how this helps much with improving the agent’s context window, and it definitely doesn’t help with human feedback loops - sort of the opposite really. I’m happy to keep Ralph at arm’s length.

Rubbing more LLMs on it probably won’t help

I've developed an instinctive skepticism for any solution which tries to solve the limitations of LLMs by just adding more LLMs

One more heuristic which I use - I’ve developed an instinctive skepticism for any solution which tries to solve the limitations of LLMs by just adding more LLMs.

For example, methodologies that claim to address the design weaknesses of a coding agent by having ANOTHER coding agent review the work with fresh eyes. I definitely can see that this will catch some things, but I’m also pretty confident that the LLM will not provide much in the way of insightful design critique or impactful course correction. Now, having a human work WITH the reviewing LLM, THAT seems feasible. But just moving the work through additional context windows and LLMs just seems like an opportunity for compounding confusion.

Finding the good stuff

Don’t get me wrong, there is clearly an amazing amount of valuable innovation going on in AI coding. The challenge is in figuring out what’s worth your time, and what’s grifter noise.

Now you have at least one heuristic to use. If something helps manage context better or creates tighter human feedback loops, it’s probably worth checking out. If not, maybe wait a month or two and see if it’s still on anyone else’s radar.

That said, maybe you think I’m way off here. I’d love to hear that! Let me know what your heuristics are, or if you have an example where my heuristic would have let you down.

Pete Hodgson

Outside Expertise For Your Engineering Teams

Assessing the Latest AI Coding Hotness

The problem

What we need is…. a META-FRAMEWORK!

Context and taste

Examples and counter-examples

Rubbing more LLMs on it probably won’t help

Finding the good stuff

The problem

What we need is…. a META-FRAMEWORK!

Context and taste

Examples and counter-examples

Rubbing more LLMs on it probably won’t help

Finding the good stuff

share this post