Disinformation with LLMs: the distribution risk
Why changing the dynamics around the distribution of malicious content is a growing concern in addition to just the generations themselves.
Thank you for reading and for your support of Interconnects, consider upgrading to paid, if you can, for access to all my work.
It’s often stated that disinformation, specifically lowering the barrier to generating harmful content, is the going practical concern about the large language model (LLM) proliferation phase we are in. When you dig into this problem, it’s actually been the case for a while that disinformation is pretty easy to generate. Now it’s getting worse. I’m interested in predicting the ways that existing challenges get 2x worse, but we need to study how AI technologies can create new attack vectors to the social order.
We’ve seen cheaply generated disinformation in Russian media in recent US elections (and China has processes doing similar things). Combine this with the extreme pressure on moderation of internet platforms, and a lot has been going on for some time: tons created, tons moderated. The challenge of these negative actors is often more about how to get their content in front of people, in addition to just how to create as much content as possible. A bigger potential risk with LLMs changing this balance of moderation to generation.
When viewing AI risks, we often focus on what happens when the abilities or marginal costs of a certain task go toward zero or infinity. Content creation has been discussed at length here — it's assumed that the golden days of internet content are ending, and distribution power is returning to established players.
This re-centralization of power is actually dependent on more nuanced dynamics, involving multiple stages of the information lifecycle. In the modern world, information travels through many bottlenecks that impact what AI can do. Roughly, the entertainment economy can be split into two phases: generation of content and delivery (or distribution) of content. It is obvious that LLMs make generation extremely cheap, but what is less obvious is what will happen to the distribution equation.
My worldview around this grew a lot from a conversation with Sayash at FAccT, one of the authors of the AI Snake Oil Substack (and book). Their post on this topic is below:
Their article covers the social risks of this technology, going into many attack vectors including spearfishing and generated content for blackmail.
In this article, I’m going to cover generally how this could happen, rather than repeat all of their great work on the sociotechnical implications. The ways that LLMs can hijack other parts of the global information ecosystem are crucial to understanding why some people are worried about these technologies. Finally, I’ll brainstorm what we can do about it.
Bypassing moderation with LLMs
Ultimately, language is the interface through which so much information is gated on the internet. Passwords are the most obvious form of this, with very few people using any physical security for computers (which is often still interfaced via unique text). Large internet platforms filter every piece of content that is hitting the site, especially from new accounts. There is going to be an emerging dynamic of reinforcement learning like exploitation being integrated with the most popular systems. In the lens of exploiting moderation filters, LLMs could really be tuned with RLHF to try and make the text seem more authentic.
Technical note: this post focuses on moderation occurring at the text level, but in reality, a lot of moderation is filtering via IP addresses, user behavior, accounts, etc. Some of these are physically grounded and likely can be avoided, but some moderation (e.g. Gmail spam) likely relies much less on this because of the open protocol of email.
Platforms with content-based moderation are likely totally screwed when it comes to this. This looks like smaller players (e.g. BlueSky) and players notorious for weak moderation practices (Twitter) becoming suspect of increased fishing, disinformation, impersonation, etc. The larger companies like Meta will likely be fine with their more sophisticated authentication measurements.