Help CERN by Playing With 300 Terabytes of Public Large Hadron Collider Data

You can analyze the data even if you don't have any special software.

by Kastalia Medrano
Getty Images

The European Organization for Nuclear Research (CERN) has just released a massive trove of data collected by its Large Hadron Collider through its Open Data Portal. The information has been released in two main formats: primary datasets that represent the original data CERN scientists would have analyzed, and derived datasets that are simplified to aid in analysis and to make the data a potentially powerful teaching tool. CERN is also providing a “Virtual Machine”, available online to everyone for free, to help users analyze the data, which means that no matter what type of computer you’ve got, you can mess around with the results of one of the grandest experiments in the history of mankind.

Best known for discovering the Higgs Boson particle a few years back, the Large Hadron Collider is the most powerful (and famous) particle accelerator in the world. The data CERN has released from it totals 300 terabytes, representing about half the total LHC data gathered in 2011. Of particular interest to many might be the inclusion of more than 100 terabytes of data on proton collisions at 7 TeV, according to CERN. The last such data release from CERN was in November 2011, but comprised only around 27 terabytes.

Once you’ve installed the VM (just click on “Install your Virtual Machine” and follow the instructions), you can begin analyzing the data from the Compact Muon Solenoid (CMS) even if you don’t otherwise have access to the specific software that would usually be necessary to process data of this kind — it comes preloaded on the Virtual Machine. You can then choose to work on either a primary dataset or a reduced (derived) dataset.

“Members of the CMS Collaboration put in lots of effort and thousands of person-hours each of service work in order to operate the CMS detector and collect these research data for our analysis,” said Kati Lassila-Perini, a CMS physicist in charge of data-preservation, in a statement. “However, once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly. The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow. And personally, as CMS’s data-preservation coordinator, this is a crucial part of ensuring the long-term availability of our research data.”

CERN representatives also expressed a desire for fresh eyes on the data, saying that releasing information in such vast quantities helped foster a spirit of collaboration and thus facilitate more discoveries.

“We are very pleased that we can make all these data publicly available,” Lassila-Perini said in the statement. “We look forward to how they are utilised outside our collaboration, for research as well as for building educational tools.”

Related Tags