If we cloned the best human writer a billion times, they'd very quickly stop being the best writer. We would all get used to (and tired of) their stylistic quirks and patterns. This is basically how LLMs work, and I feel like that's an underappreciated reason for why people hate LLM writing so much.
I really loved the writing style (particularly in educational / explanatory contexts) of Gemini 2.5 Pro. For a week. After that week, "imagine a football field" really got old.
There's a serious tradeoff here. If you train your LLM to have its own distinct personality, voice and style, people will quickly start to recognize those characteristics and get tired of them. If you make the LLM bland... well, you have a bland LLM.
This problem also comes up in other contexts. If you ask your LLM for a fantasy character, you'll get a gnome named Pip. If you ask it for a sci-fi story, you'll get the same two or three character names (and they even repeat across providers!) LLMs just aren't that good at randomness, and they tend to repeat the same patterns over and over again, unless specifically prevented from doing so. This can be fixed by injecting randomness into the LLM's context.
We don't just need an LLM which is good at writing, we need an entire writing agent. Something that would identify the aspects of the prompt that need to be randomized, generate the options for them (LLMs will give you much more of a long-tail distribution if you ask for 150 examples of x and then randomly pick one), and then inject the choices into the writing prompt.
AI writing is often sterile. I compare it to someone who mastered the “rules” of freshman composition (e.g., never split an infinitive, begin a sentence with a conjunction, or use a sentence fragment). But great writers know when to “break” these rules to great effect. And therefore communicate the precise voice you’re describing.
I think this is why those of us who did a lot of reading and writing in the pre-AI era can generally spot AI writing with ease. The “AI stink” can be overwhelming. I will often use the tools to assist with organization and structure/scaffolding but nevertheless insist my own voice resonates clearly.
I really like this piece! The way I see it, good writing has "spike." By definition good writing is an outlier and requires some unique take or unique personality - which today's models aren't trained for.
The point about style being suppressed by multi-objective training really resonates, if we want AI that writes with personality and voice, the whole training pipeline needs to prioritize creative nuance, not just token prediction.
I have to say I disagree on the idea that LLMs are fully capable of writing well, were some reprioritizations made.
The basic issue in my view is that what keeps language alive - in whatever form - is the continual effort to make it new.
For example, poetry is especially known for taking timeless truths, worn down from repetition into empty platitudes, and making them new by reformulation in the present moment.
The premise of LLMs is to do exactly the opposite, add repetition to the already said, and the more it has been said, the more likely said yet again.
LLMs are platitudes machines. It’s ironic that a model based on language can’t write. But maybe we should judge less harshly on this. Exploiting the power of language - instrumentalizing it in this way - probably should make writing sound like a dead cliche: it’s all about the intention, and dead language is the proper manifestation of the intention put into the design. Poetry and good writing in general come first from care. LLMs don’t know anything about that.
Imo good writing can be better improved by the right app layer abstraction algorithms, where you apply very strict prompts based on the target audience you're writing towards as context and parse the zytegyst
Has anyone that you know of begun to train a model to write better? Are any of the commercially available writing tools: Lex, Sudowrite etc better than the models?
I agree that the market is smaller, but I believe as more people begin to recognize AIs' repetitive patterns, there's going to be a desire from a lot of creators to be able to stand out from that. Today, it is incredibly hard to get the models to do it even with a lot of prompting and instructions.
As someone who's way outside their depth 😃 I have been thinking about using a base model and then creating some software tool to help automate the training process for an individual user.
Am I dreaming that this is a process that would work and could be developed?
"Good" compared to what and according to whom. Having read lots of awful human writing from experienced coworkers to teen students I've taught and tutored, most LLMs are at least as good as the modal native English speaking, college-educated adult. And occasionally, recently, I've been floored with the quality of writing from Claude/Gemini.
There is at least some evidence that many people prefer/don't dislike AI-generated prose and that fine-tuning on trained writers is the answer you're looking for: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5606570. (I am in no way involved with this paper.)
I think a more accurate take would be that most LLMs, out of the box, don't generate writing you (an avid writer) prefer. But that's fine, I hate reading Dickens, Gladwell, and just about every writer at The Atlantic.
Also, I don't really concede that this (super high quality prose with a strong authorial tone) is something we should care all that much about models getting better at. You could make a good argument that we explicity want this to stay a human thing. Unless you feel that somehow there will be knock-on effects on AIs' abilities to make scientific discoveries or do some other downstream high value task if we make them better writers? N=1 but I want great AI writers to the same extent that I want great AI basketball players in the NBA dunk contest. It's just cooler when a human does it.
Most AI writing feels mid because the underlying models aren’t reasoning — they’re pattern-matching. When the system has no internal structure, no objectives, no constraints, and no causal model, the output collapses to “statistically average prose.”
What fixes this isn’t more style or bigger models. It’s architecture. When you force the model into a structured reasoning process — clear problem framing, system modeling, alternatives, constraints, red-team, verification — the writing stops being generic prediction and starts becoming deliberate thinking.
The gap isn’t the model. It’s the absence of a thinking framework. AI writing is mid because AI reasoning is unstructured. Solve the reasoning architecture and the writing quality jumps immediately.
- reverse-engineering implied theory of mind. When we read, we form a model of the author's mind, this is a lot of what forms our impression of good writing. Seed the generation with chunks of text that produce the desired impression.
- model the act of reading in great detail using ethnographic and digital ethnographic data, then iterate at scale.
If we cloned the best human writer a billion times, they'd very quickly stop being the best writer. We would all get used to (and tired of) their stylistic quirks and patterns. This is basically how LLMs work, and I feel like that's an underappreciated reason for why people hate LLM writing so much.
I really loved the writing style (particularly in educational / explanatory contexts) of Gemini 2.5 Pro. For a week. After that week, "imagine a football field" really got old.
There's a serious tradeoff here. If you train your LLM to have its own distinct personality, voice and style, people will quickly start to recognize those characteristics and get tired of them. If you make the LLM bland... well, you have a bland LLM.
This problem also comes up in other contexts. If you ask your LLM for a fantasy character, you'll get a gnome named Pip. If you ask it for a sci-fi story, you'll get the same two or three character names (and they even repeat across providers!) LLMs just aren't that good at randomness, and they tend to repeat the same patterns over and over again, unless specifically prevented from doing so. This can be fixed by injecting randomness into the LLM's context.
We don't just need an LLM which is good at writing, we need an entire writing agent. Something that would identify the aspects of the prompt that need to be randomized, generate the options for them (LLMs will give you much more of a long-tail distribution if you ask for 150 examples of x and then randomly pick one), and then inject the choices into the writing prompt.
AI writing is often sterile. I compare it to someone who mastered the “rules” of freshman composition (e.g., never split an infinitive, begin a sentence with a conjunction, or use a sentence fragment). But great writers know when to “break” these rules to great effect. And therefore communicate the precise voice you’re describing.
This chunk would fit right in in the piece. I love when I discover a chance to break the rules.
I think this is why those of us who did a lot of reading and writing in the pre-AI era can generally spot AI writing with ease. The “AI stink” can be overwhelming. I will often use the tools to assist with organization and structure/scaffolding but nevertheless insist my own voice resonates clearly.
I await “good writing” evals. Or is good writing like pornography. And no, I don’t mean erotica; you know what I mean…
I really like this piece! The way I see it, good writing has "spike." By definition good writing is an outlier and requires some unique take or unique personality - which today's models aren't trained for.
The point about style being suppressed by multi-objective training really resonates, if we want AI that writes with personality and voice, the whole training pipeline needs to prioritize creative nuance, not just token prediction.
Good post. Agree on your points, and did some work to try and poke models to write better here: https://www.strangeloopcanon.com/p/can-we-get-an-ai-to-write-better
I have to say I disagree on the idea that LLMs are fully capable of writing well, were some reprioritizations made.
The basic issue in my view is that what keeps language alive - in whatever form - is the continual effort to make it new.
For example, poetry is especially known for taking timeless truths, worn down from repetition into empty platitudes, and making them new by reformulation in the present moment.
The premise of LLMs is to do exactly the opposite, add repetition to the already said, and the more it has been said, the more likely said yet again.
LLMs are platitudes machines. It’s ironic that a model based on language can’t write. But maybe we should judge less harshly on this. Exploiting the power of language - instrumentalizing it in this way - probably should make writing sound like a dead cliche: it’s all about the intention, and dead language is the proper manifestation of the intention put into the design. Poetry and good writing in general come first from care. LLMs don’t know anything about that.
I contest the assertion that AI images are beautiful; they're just as mid as the sentences. They have the same empty sophistication.
Agree! Ask any visual artist.
Didn't expect this take; always apreciate your insights.
Imo good writing can be better improved by the right app layer abstraction algorithms, where you apply very strict prompts based on the target audience you're writing towards as context and parse the zytegyst
I agree 💯
Has anyone that you know of begun to train a model to write better? Are any of the commercially available writing tools: Lex, Sudowrite etc better than the models?
I agree that the market is smaller, but I believe as more people begin to recognize AIs' repetitive patterns, there's going to be a desire from a lot of creators to be able to stand out from that. Today, it is incredibly hard to get the models to do it even with a lot of prompting and instructions.
As someone who's way outside their depth 😃 I have been thinking about using a base model and then creating some software tool to help automate the training process for an individual user.
Am I dreaming that this is a process that would work and could be developed?
"Good" compared to what and according to whom. Having read lots of awful human writing from experienced coworkers to teen students I've taught and tutored, most LLMs are at least as good as the modal native English speaking, college-educated adult. And occasionally, recently, I've been floored with the quality of writing from Claude/Gemini.
There is at least some evidence that many people prefer/don't dislike AI-generated prose and that fine-tuning on trained writers is the answer you're looking for: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5606570. (I am in no way involved with this paper.)
I think a more accurate take would be that most LLMs, out of the box, don't generate writing you (an avid writer) prefer. But that's fine, I hate reading Dickens, Gladwell, and just about every writer at The Atlantic.
Also, I don't really concede that this (super high quality prose with a strong authorial tone) is something we should care all that much about models getting better at. You could make a good argument that we explicity want this to stay a human thing. Unless you feel that somehow there will be knock-on effects on AIs' abilities to make scientific discoveries or do some other downstream high value task if we make them better writers? N=1 but I want great AI writers to the same extent that I want great AI basketball players in the NBA dunk contest. It's just cooler when a human does it.
I think it displays flair. If it were a student, i would label it promising.
But when it drafts something for me I spend many hours unwriting it.
I have made rules DONT USE ADVERBS for one. That helps.
Most AI writing feels mid because the underlying models aren’t reasoning — they’re pattern-matching. When the system has no internal structure, no objectives, no constraints, and no causal model, the output collapses to “statistically average prose.”
What fixes this isn’t more style or bigger models. It’s architecture. When you force the model into a structured reasoning process — clear problem framing, system modeling, alternatives, constraints, red-team, verification — the writing stops being generic prediction and starts becoming deliberate thinking.
The gap isn’t the model. It’s the absence of a thinking framework. AI writing is mid because AI reasoning is unstructured. Solve the reasoning architecture and the writing quality jumps immediately.
So you’re submitting this comment as exhibit A?
Hahaha yeah
Agree on diagnosis and incentive structure.
Two promising areas for research IMO:
- reverse-engineering implied theory of mind. When we read, we form a model of the author's mind, this is a lot of what forms our impression of good writing. Seed the generation with chunks of text that produce the desired impression.
- model the act of reading in great detail using ethnographic and digital ethnographic data, then iterate at scale.