The one problem with AI content moderation? It doesn’t work
No one was quite sure where the never-ending flurry of fires and Indonesian car crashes were coming from. But the system would keep flagging them, according to Josh Sklar.
The former content moderator once worked on the team at Instagram that assessed posts flagged by artificial intelligence (AI) as likely to be problematic. And while the system would regularly catch that illicit content, the number and the nature of false positives was confusing at best.
In the UK, the onus on social media platforms to moderate toxic content online is only set to become more intense with the passage of the Online Safety Bill. One clause in particular – calling on platforms to “prevent” users ever encountering dangerous content – has convinced many that the platforms will turn to greater automated moderation to try to solve the problem.
The only problem? It might not work.
Last year, the BBC tried to use an AI tool to measure the scale of toxicity faced by politicians online. It identified that some 3,000 “toxic” tweets are sent to MPs every day.
The issue was the AI defined “toxic” to mean anything “rude, disrespectful or unreasonable”, meaning plain descriptive words like “Tory” and “hypocrite” were often flagged. One Twitter user pointed out that the tool labelled anti-trans vitriol as less toxic than calling someone a “transphobe”.
In many ways, that struggle to define “toxic” is at the core of what is the problem for AI content moderation systems. That their aim – to moderate and “fix” the unspoken flaws and dangerous sentiments in the grey world of human interaction – is hard for a machine learning system to fully achieve.
“The AI would work well in proactively removing the worst stuff, as they do now with images of violence, for example,” says Eugenia Siapera, director of the centre for digital policy at University College Dublin. “But the harder decisions cannot be automated.”
Automated moderation in practice
To understand why these systems tend to have so many flaws, it’s first worth looking at how automated moderation works. Like any machine learning system, automated moderation is based on a vast database of rules and example posts that train the system to spot similarly illicit content. This will be informed by a list of certain prohibited keywords, which will vary from site to site.
Eugenia Siapera, University College Dublin
But there are plenty of complications to that. One example is just how culturally specific social interactions are. Siapera explains: “Let’s say the system operates in English, but what kind of English is that? Is it the American one? Is it the British one? Is it the Irish?” She cites the word “traveller”, which in Ireland represents a recognised ethic group but may not have the same meaning elsewhere in the world. That kind of variation would be hard for a totalising automated system to spot.
And just as staff and algorithms are trained to spot dangerous content, the people they are policing think of new ways to evade them. That tendency is what underpins the rise in popularity of Nazi dogwhistle codes online, such as 88 (stands for Heil Hitler) or 14 (for the slogan of far-right terrorist David Eden Lane).
“It creates this weird arms race with racists,” as Sklar puts it. “There’s just so many new slurs all the time.”
The issue is only made worse by how opaque these systems are. Few inside, let alone outside, major tech companies themselves fully know how these (usually proprietary) systems function. And that stretches beyond how the system works to the exact nature of the datasets being fed into the algorithm that shape its behaviour.
“In other safety-critical industries, that kind of technology would be subject to independent third-party open testing,” says Full Fact’s head of policy, Glen Tarman. “We repeatedly crash cars against walls to test that they are safe, but internet companies are not subject to the third-party independent open scrutiny or testing needed.”
It’s also worth bearing in mind that this type of automation is still in its infancy. “Artificial intelligence has really only been around since 2017 within the content moderation space. Before then, there wasn’t the processing power available,” says Cris Pikes, chief executive of AI-based content moderation firm Image Analyzer. “The models were fundamentally too relatively simplistic at that stage.”
All of this means that having some kind of human oversight in the system is necessary, according to those Computer Weekly spoke to, to teach the algorithm to spot the latest trends in racist content or the different regional meaning of terms.
Glen Tarman, Full Fact
But even if nominally there is human oversight, there’s a question as to how extensive it would be. Meta, for example, employs just 15,000 content moderators to cover Instagram and Facebook, according to the New York Times, or roughly one per 333,000 active users. The sheer volume of content creates a huge burden on the few staff who do work to moderate content. The pressure means moderators spend less than 30 seconds assessing the problems with a post.
But the complications go beyond just AI moderation itself, to how it would operate. Clause 9 of the proposed Online Safety Bill – which would create an obligation for tech firms to “prevent individuals from encountering” dangerous content – is a particular cause for concern for many campaigners.
“My interpretation is it would need filtering of content as it’s uploaded… That’s a concern, because if you’re taking something down before its published, that is a serious interference in freedom of expression,” says Monica Horten, a policy manager at Open Rights Group. “I think it’s a very dangerous slippery slope to go down.”
She argues that such a move would constitute prior restraint – a form of government censorship that prohibits certain content before publication – and something that is usually only allowed in the UK in certain specific legal circumstances, like injunctions.
“If it’s taken down before it’s even gone up, you might not even know anything is wrong,” Horten adds. “The Online Safety Bill has no provisions for proper notification of users when the content is restricted. There’s a provision for a complaints process, but how can a person complain if they don’t even know it happened?”
In the bill, as it stands, it is left to telecoms and media regulator Ofcom to set guidelines for what is considered illicit content online, an organisation whose new chair admits he doesn’t even use social media.
The problem is that some form of automation may be required if we ever want to moderate social media platforms properly. The number of posts made on Instagram alone numbers in the tens of billions – a scale even an army of human moderators would struggle to vet manually.
A fundamentally social problem
There is also a need for legislation like the Online Safety Bill, according to most of those Computer Weekly spoke to, or at least something that aims to codify laws to govern what are currently largely ungoverned online spaces.
“The laws have not kept up with what happens online,” says Pikes. “So the government is in a really hard place to try and bring this in. They’re kind of damned if they do and damned if they don’t.”
Where that leaves everything is somewhat complicated. To start with, there are certainly plenty of ways to tinker with the Online Safety Bill to improve it. These include everything from clearer guidelines on moderation and what constitutes illicit content and improving the complaints process for those who get unfairly restricted, to adding measures to minimise how much content is taken down by platforms.
But many wonder if the debate misses the core of the issue.
“What is the problem here? Is it a problem of technology? Racism on social media is a symptom of broader structures… These are social problems, and they have to be addressed holistically,” says Siapera. “If you try to control this problem at the level of circulation, it’s just this never-ending Whac-A-Mole kind of game.”