Get Good at Agents

Jan 21

The tools are getting so powerful that we need to change how we scope, manage, and approach our work.

16 Comments

Interesting workflow: GPT for planning and Claude Code for execution. Lately, I've found myself using only Claude Code, and I wonder whether I'm capping what I can achieve. The most interesting takeaway from this article is the need for a peaceful mind to get the best out of this technology. What I find most useful, rather than running multiple instances and requesting everything at once, is going for a walk. I'm fortunate to live near a beautiful park. I go there without my phone, just with my diary, and start taking notes on future implementations, where new ideas naturally emerge. I come back home, site at my computer and then prompt claude with more intention.

Randall Bennington

Jan 21

Your comments about GPT5.2 Pro and Codex match my experience as well. It's like Harrison Bergeron trying to dance. My own setup today is Actor-Critic: 5.2 Plan - > Opus 4.5 execute -> 5.2 checks and approves commits. It works, but the tokens go burrrrrrrrrrrrrr.

One other thing - Claude's OCR is awful. Anything image-related, I have a pretool/hook & /skill combo that pipes out Gemini. Lighter than loading up an MCP.

Reply (1)

Rob Bru

Jan 22

Claude is good as blind with the OCR!

Reply (1)

Kevin Northover

Jan 27

Depends on your problem. I’ve been working on a tool to extract the text from advertisements in 1880s community newspapers, these have very varied typography and around 40-50 ads per issue. Gemini throughout the development, and it needs the high end models to work at all well. I recently ran a test with Opus 4.5 and on initial inspection it is at least as good as Gemini’s best model.

Cool project

Nathan, could you please explain how you configure / accomplish this: "I often have Claude Code pass information back to GPT 5 Pro for a deep search when stuck with a very detailed prompt"?

Reply (1)

Nathan Lambert

Jan 21

I ask Claude to summarize the situation for deeper search, then I tell GPT to write a plan for Claude — so I do it very literally.

Parth Tiwary

Jun 25

Getting good at agents is mostly getting good at reading what the agent was handed. The model is rarely the bottleneck. The harness, the tools, the context window are. Teams that learn to read traces debug in one look. Everyone else rewrites prompts and guesses.

Pawel Jozefiak

Feb 27

This matches my experience almost exactly. I've been running an autonomous AI agent (Wiz) for a few months now. Started by micromanaging every action, asking for permission on everything. Now it runs night shifts, makes git commits, and manages its own task queue.

The hardest part wasn't the tech - it was learning to let go. You have to redesign your workflow around delegation, not assistance. I actually built a native dashboard just to keep up with what my agent was doing autonomously. https://thoughts.jock.pl/p/wiz-1-5-ai-agent-dashboard-native-app-2026

The moat is definitely in judgment, not effort.

siyu

Feb 3

When you say "Claude Code has me considering what should I work on now that I know I can have AI independently solve or implement many sub-components."

Are you relying on Claude to write production grade code? Or are these proof of concept type work?

Anuj Kumar

Jan 31

The number of posts written on claude and opus my so many people that it seems suspicious. Is Anthropic paying people to promote claude ?

Taylor Rock

Jan 26

"Leave Claude on while I'm away and see what it comes up with" - wondering if you can walkthrough workflow here. I'm familiar with Ralph Lop,s but this sounds like something more emergent/different that enables greater degree of freedom.

Jihyun

Jan 22

great read - i think this archives the 2026 zeitgeist post opus 4.5. i sometimes find myself spending more time optimizing my /.claude than the actual product.

Jagan Seshadri

Jan 22

The Claude piece was a great read! The fact that there's no written guide and we've people tinkering with it en-masse is really cool. Looking forward to see how the reasoning/planning tokens (traces) can be RLed on to improve this further.

Ethan

Jan 22

I’m curious if you are doing all of this for new personal projects or integrating with existing enterprise prod code? Thank you.

Josh Devon

Jan 21

Really well said. The power of Gas Town and Ralph Wiggum is the future, but these incredible capabilities also need to be made reliable and governable. Wrote about that here: https://securetrajectories.substack.com/p/gas-town-agent-control-citadel