Git With It

Coronavirus: Scientists explain what we're doing wrong in understanding its spread

The internet is mobilizing to fight coronavirus, but data scientists say we need more.

Originally Published: 

We need a lot of things to fight back against COVID-19, the infectious disease that has sickened over 120,000 patients worldwide. We need more hand sanitizer, the security to take time off from work, better public health habits, and ideally, a vaccine.

But what the data scientists who study coronavirus need is as unique to their profession as it is crucial to the rest of us: A powerful tool that could help us understand where the novel coronavirus might go next and how effective our management plans are against it.

And it's the kind of tool that can only exist online.

We need a single place to put all of the countless scientific papers and datasets that tell us every crucial detail about COVID-19, argue a team of scientists in an editorial published Wednesday in Science Translational Medicine. In the paper, they call it a "centralized exchange bank." It would be a publicly available place where everyone in the world studying the pandemic can find the information they need.

James Hyman is one of the editorial's co-authors and a mathematician at Tulane University. He tells Inverse that such a framework is crucial as we try to stop COVID-19, as well as future diseases:

"What we need is to create a stronger infrastructure so that every new emerging infection is not a fire drill to recreate an effective repository for the data and research papers," Hyman says.

Scientists already have an idea of what that "stronger infrastructure" would look like, says Hyman. Working with the Modes of Infectious Disease Agent Study (MIDAS, a government organization that models disease outbreaks) they're already building this "centralized exchange bank" on GitHub.

"What we need is to create a stronger infrastructure."

The new MIDAS COVID-19 Repository, says Hyman, is supposed to make sure that all relevant information sees the light of day. It's still only in the early stages.

"We still have a long way to go," Hyman says.

What there is to gain

The scientific community learned its lesson about data-protectionism during the 2014-2016 Ebola outbreak in West Africa. During that crisis, about 11,325 people died.

In an op-ed published in the New York Times in April 2015, the Chief Medical Officer of Liberia's ministry of health lamented that scientific papers from as early as the 1980s had predicted that Ebola might strike in Liberia. But those papers went unread by the people on the ground who eventually had to deal with the outbreak.

"Even today, downloading one of the papers would cost a physician here $45, about half a week’s salary," the authors wrote.

Keep in mind that COVID-19 is a far more mild disease than Ebola is. But already, COVID-19 has pushed scientists and publishers to make information more available.

Hundreds of academic journals, organizations, and publishers committed to making papers about the COVID-19 open-access as early as January 2020. On the subreddit R/datahoarder archivists organized around a petition called "Unlock Coronavirus research for world's scientists. When the petition closed, 32,544 scientific articles were made public.

Breaking scientific research is also being shared without the peer review process. In China, scientists who sequenced the COVID-19 genome made it public on a niche-virology site, (now, COVID-19 chatter on the site is incessant). The biology pre-print server BioRxiv has pinned a banner to the top of their homepage: "bioRxiv is receiving many new papers on coronavirus 2019-nCoV," it reads.

Information about COVID-19 is spreading nearly as fast as the virus is. But Scott Layne, a professor emeritus at UCLA and another co-author of the new editorial, tells Inverse that it's not enough. Layne argues "we need better analytics" because they can help us evaluate how to combat the virus in real-time, as well as tell us what works and what doesn't.

"If we want to do a better job of dealing with this kind of global emergency, we basically need to have better guidance," he explains. "If people wear face masks, or gel their hands, or restrict their contacts by a certain amount — it’s just guesswork what the impact of those measures are."

"Analytic models can at least help us put numbers on what these various interventions might do."

What should happen next

Hyman and Layne are part of the mathematical "modeling community" – they're not only concerned with sharing papers. They want the highest quality of data to be available so that we can be better at predicting the future of COVID-19, and that requires far more work than is already being done, he says.

"It would require an around-the-clock team to get those organized," Layne says.

Though Layne says that "nothing exists like that," Hyman adds that the MIDAS COVID-19 repository is a step in the right direction. He's taking the steps to put the idea that Layne describes into action. It's first, shaky steps are visible on Github.

"This is the beginning of the type of framework that we described in our editorial," Hyman says.

This article was originally published on

Related Tags