A computer program commonly used to predict recidivism turns out to be not too reliable.

Just like a professional chef or a heart surgeon, a machine learning algorithm is only as good as the training it receives. And as algorithms increasingly take the reigns and make decisions for humans, we’re finding out that a lot of them didn’t receive the finest education, as they mimic human race- and gender-based biases and even create new problems.

For these reasons, it’s particularly concerning that multiple states, including California, New York, and Wisconsin, use algorithms to predict which people will commit crimes again after they’ve been incarcerated. Even worse, it doesn’t even seem to work.

In a paper published Wednesday in the journal Science Advances, a pair of computer scientists at Dartmouth College found that a widely used computer program for predicting recidivism is no more accurate than completely untrained civilians. This program, called Correctional Offender Management Profiling for Alternative Sanctions, analyzes 137 different factors to determine how likely it is that a person will commit another crime after release. COMPAS considers factors like substance use, social isolation, and other elements that criminologists theorize can lead to recidivism, ranking people as high, medium, or low risk.

Alcatraz Prison
Machine learning algorithms that gauge incarcerated people's risk of recidivism are deeply flawed, say researchers.

And sure, risk assessment sounds great. Why not have more data to help courts determine who is a greater risk? But what Dartmouth computer scientists Julia Dressel and Hany Farid found was that untrained individuals correctly judged recidivism risk with just about the same accuracy as COMPAS, suggesting that the supposed power of the algorithm isn’t actually there.

In one trial that included just a fraction of the information used by COMPAS (seven factors instead of 137, and excluding race), a group of human volunteers on the internet, with presumably no training in criminal risk assessment, evaluated case reports. They correctly estimated a person’s recidivism with 67 percent accuracy, compared to COMPAS’s 65 percent accuracy.

Take a moment to let that sink in. Untrained people on the web were slightly better at predicting whether a person would go back to jail than the tool that is literally designed to predict whether a person would go back to jail. And it gets worse. Once you add a defendant’s race, the volunteer’s false-positive and false-negative rates were within just a few percentage points of COMPAS’s. So not only is COMPAS not that great at predicting recidivism, it’s just as prone to racial bias as humans are. So much for the cold logic of computers.

Rates of false-positives and false-negatives of humans versus COMPAS.
The researchers found that humans were almost as good as the algorithm at predicting recidivism rates. They also found that humans and the algorithm had similar rates of false-positives and false-negatives when race is factored in.

The researchers then made a linear model that matched COMPAS’s prediction rate with just two factors: age and number of previous convictions. Just to be clear, this prediction would also be unfair, but it demonstrates just how flawed COMPAS is.

And while this research is new, the big takeaways it espouses are not. In a 2016 investigation, ProPublica reporters found that not only is COMPAS unreliable, it’s actually systematically biased against African Americans, consistently rating black people as higher risk than whites who committed more serious crimes. Hopefully, this new research will help pave the way for juster risk assessment processes in the criminal justice system.

The fact that COMPAS is useless at best and deeply biased at worst suggests that computer-based risk assessments could be deepening the injustices that the justice system is supposed to address. Since risk assessment scores can be applied at any step of the criminal justice process, including while setting a person’s bond, determining whether they’re granted parole, and in some states, even for determining a person’s sentence, this research suggests a dire need to reexamine the use of COMPAS and other programs.