Claude Code vs Codex: What Changed When I Switched Desktop Apps

I do not want to write the normal comparison post.

The normal version would make a table. Claude Code on one side, Codex on the other. Checkmarks for “terminal,” “diff review,” “worktrees,” “browser,” “mobile,” “model quality,” and whatever else makes the post look useful before anyone has actually done the work.

That is not the comparison I care about.

I care about what happens when the app is open on my actual machine, pointed at my actual repo, and I am trying to get something finished without letting the agent quietly make a mess.

That is the difference I feel between Claude Code and Codex right now.

The Claude Code era

Claude Code was the first AI coding tool that made me feel like I could build things that were bigger than one prompt.

Mandarin Deck came from that period. The first version was a single 10.8 MB HTML file with inline JavaScript, CSS, and word data. No backend. No build step. No proper architecture. It was ugly in the way working first versions are allowed to be ugly.

Then the app grew teeth.

I added Supabase sync. Google OAuth. Local-first progress. IndexedDB. Service workers. Reading mode. Native-speaker audio. A migration to the official November 2025 HSK 3.0 list. The kind of work where every small decision creates three future bugs.

Claude Code was good for that because it fit the rhythm I had then: terminal-first, repo-first, one long conversation where I could say, “look at this file, explain what is happening, patch the smallest thing.”

It helped me get through bugs I would not have known how to frame alone.

The reset bug is the one I remember clearest. A user tapped “Reset all progress,” saw success, came back later, and the app had restored the old progress. The code cleared localStorage. IndexedDB still had the old state. On reload, the wrapper repopulated localStorage from IndexedDB. The app undid its own reset.

That is the kind of bug AI is weirdly good for if you give it the whole trail. Not “fix reset.” More like: here is the storage layer, here is the user report, here is what persists after reload, find the mismatch.

For a while, that was enough. Claude Code felt like a patient pair programmer in the terminal.

The desktop app changes the question

Claude Code has a desktop app now. Anthropic describes it as a graphical interface for running multiple sessions side by side, with a sidebar, integrated terminal and file editor, visual diff review, live app preview, PR monitoring, and scheduled tasks. The Code tab gives it direct local file access where I approve changes in real time.

That matters.

The desktop shape makes Claude Code less like “a terminal agent I talk to” and more like a workspace. Files, terminal, diff, preview, chat. For people who hated the terminal ceremony, that is a big shift.

But my mental model of Claude Code is still shaped by the earlier work. I reach for it when I want a tight coding session around a known project, especially older repos that already have Claude-era instructions and habits.

Claude Code feels best when I am steering closely.

Not because it cannot run further. It can. But my trust pattern with it was built through interruption, correction, and small patches. “Stop. Look at this file. Why did you do that?” That still feels natural.

Codex feels more like managing work

Codex hit me differently.

OpenAI describes the Codex app as a focused desktop experience for working on Codex threads in parallel, with built-in worktree support, automations, and Git functionality. The app docs also call out review/shipping changes, terminal actions, an in-app browser, Chrome extension, Computer Use, app screenshots, skills, plugins, artifacts, and sidebars.

That sounds like product copy until you use it on a messy site refresh.

The personal site update was the first place Codex felt obviously better for me. The job was not just code. It was copy, hierarchy, SEO metadata, blog posts, a stack page, browser preview, mobile wrapping, and careful wording around digital asset security.

The actual prompt was not complicated:

Update the homepage for career credibility. Keep the dark bento design. Do not imply I managed client funds. Move proof of shipped projects earlier. Add Codex and GPT-5.5 where the tooling is current. Run the build and check mobile chips.

That is the kind of prompt where I do not want one big patch and a prayer. I want a thread that can read, patch, build, open the app, inspect the DOM, check mobile, revise copy, and leave me a trail.

Codex was good at that. Not perfect. Good.

It changed the work from “can the model edit this file?” to “can the app carry the whole job without losing the proof boundary?”

That is the part I like.

Where Claude Code still feels better

Claude Code still feels good when the work is tight and local.

If I am in a repo with a known bug, especially one shaped around older Claude Code sessions, I still trust that rhythm. It is easy to say: inspect this function, explain the state machine, patch only this path, add a test, stop.

Claude also tends to feel conversationally careful. When I am trying to understand a codebase or talk through a design before touching files, I like that. It has a softer edge as a thinking partner.

The desktop app also lowers the beginner barrier. No separate CLI ceremony. Open the app, click Code, choose a project, review changes. For workshops, that matters. People get stuck before the first useful prompt far more often than online AI arguments admit.

If I were teaching someone their first AI coding session, Claude Code Desktop is easy to explain.

Where Codex is winning for me

Codex is winning the “whole task” surface.

When a job has writing, code, browser checks, screenshots, external docs, and a final handoff, Codex feels more natural to me. The thread model, worktree support, in-app browser, and review flow make it feel like I am supervising work rather than just prompting a tool.

That matters for my current work because my projects are not pure software.

RetireCalc needs tax logic, UI, disclaimers, privacy choices, and formula trace.

Mandarin Deck needs language data, sync, PWA behavior, and user-facing recovery paths.

This site needs copy that does not sound like AI wrote it, while still being technically current.

Fortress21 needs careful custody language: architecture, workflows, inheritance protocols, operational controls. Not “managed client funds.” Not vague trust language.

Codex fits that kind of mixed work better right now.

The failure mode is different too

Claude Code can get stuck in the code.

Codex can get too good at the whole page.

That sounds like a compliment until you read the copy. The first version of my Codex/GPT-5.5 article had the right structure and the wrong smell. It sounded like a calm machine explaining the lesson it had just learned. The sentences were reasonable. That was the problem.

I had to stop and make a “no AI slop” rule for myself:

Which sentence proves a person was there?
Which bug, room, command, screenshot, or diff is this anchored to?
Could any competent AI write this without seeing my work?
Is the ending a real observation, or just a tidy lesson?

That is now part of the workflow.

The stronger the tool gets, the more I need the review habit.

My current split

If I had to explain my actual usage now, it would be this:

Claude Code is where I go for tight repo work, especially older projects that came up in that ecosystem.

Codex is where I go for broader work that has to cross files, browser state, copy, sources, and final verification.

ChatGPT is where I think before the work becomes a patch.

Cursor and OpenCode are good workshop surfaces because people can see files change on their own laptops.

Make and n8n are for glue, not architecture.

None of this is permanent. These tools are moving too quickly for a forever take. But for my work right now, Codex feels like the stronger desktop command center. Claude Code still feels like the more familiar coding partner.

I do not think the question is “which one is smarter?”

The better question is: where does the tool put your attention?

Claude Code puts mine close to the code.

Codex puts mine on the whole job.

That is why I am using both, but reaching for Codex more when the work has to ship with receipts.

Sources

OpenAI Codex app docs: https://developers.openai.com/codex/app
Claude Code desktop quickstart: https://code.claude.com/docs/en/desktop-quickstart
Mandarin Deck build story: Shipping Mandarin Deck