Facebook is teaching AI to make sarcastic and hurtful memes

Don't worry, it's not to neg you on the TL. Researchers hope to train the AI to seek out nasty memes and shut them down.

Angry woman scolding her friend about phone content sitting on a couch in the living room at home

Facebook wants your help training its artificial intelligence engines to recognize hateful memes. The company yesterday launched the Hateful Memes Challenge in conjunction with data science competition company DrivenData, offering a $100,000 prize pool to researchers who submit models based on its Hateful Memes dataset. The contest is open now and runs until the end of October.


Facebook has long relied on AI moderation to assist its human moderators, but COVID-19 stay-at-home orders forced the company to send home all its human moderators with pay. That’s left Facebook no choice but to rely more heavily than ever on its AI moderation protocols — and they’re not exactly doing a great job at automatically flagging misinformation.

The OG social network also released a Community Standards Enforcement Report on Monday. Facebook releases these reports every so often to outline how well the company is doing with enforcing its Community Standards. Because this reporting period overlaps with the beginnings of the COVID-19 pandemic, much of the new report speaks to how Facebook AI has been handling its increased moderation load.

Facebook’s track record with hate speech and other unsavory posts is bleak, to say the least, so much so that it’s left the company’s human moderators with boatloads of trauma. Successfully training artificial intelligence to take on this burden would be a healthy step forward for the social network’s practices. But the challenges associated with training AI to recognize hate speech — especially in meme form — are many. It's likely we won't see meaningful results for a long time.

About that data set — The Hateful Memes Challenge is based on Facebook’s Hateful Memes dataset, which is basically the company’s attempt at training a computer to recognize when a specific image paired with specific text is hateful.

Examples from Facebook. These are not from the actual dataset.

In order to create the dataset, Facebook created more than 10,000 examples of multimodal (that’s image plus text) content. The company says the examples cover a wide variety of hate speech and types of images, starting with real hateful memes that had been shared on the platform. The memes cover protected categories such as religion, gender, and sexual orientation, as well as types of attacks such as inciting violence or portraying types of people as criminals or terrorists.

Facebook says only verified researchers will be able to access the full data set to prevent potential misuse.

Still a long way to go on this — Facebook’s AI models are getting better. But they’re not anywhere near accurate yet. Facebook’s tests using the data set, which included several different well-known model architectures, all fell very short of human moderation.

In the company’s testing, even the most complex multimodal models were only able to achieve about 64 percent accuracy. Many of the models fell under 60 percent accuracy. Meanwhile, human moderators were able to identify hateful memes with close to 85 percent accuracy.

How do you train AI to understand memes? — Facebook is on the right track here. The company’s dataset is by far the most complex and accurate available right now — but even these models continue to fall short.

Memes are complex. They often use tone to create meaning; for example, sarcasm runs rampant in meme-creation, and AI is not very good at detecting sarcasm. The language associated with a particular image can have so many layers of meaning that it seems impossible to ever really train AI to understand memes. They’ll always miss the mark.

Facebook is exploring how AI can better detect other harmful information, too, such as COVID-19 misinformation that’s still been circulating despite the social network’s best efforts.

Maybe some researchers will be able to use Facebook’s Hateful Meme dataset to train better AI moderators. It’s doubtful that we’ll see a model with human-like accuracy any time soon. But hey, maybe you could win $100,000 trying to solve the issue.