When “bad” data gets sucked into a machine learning system — that’s how Alan Greenspan put it when discussing the computer models that failed to predict the 2008 recession — that information can be hard to dislodge. But a new concept, proposed by computer scientists Junfeng Yang and Yinzhi Cao, of Columbia University and Lehigh University, respectively, brings the idea of unlearning to computers. As Cao and Yang write in the abstract published for the 2015 IEEE Xplore conference, you don’t have to go all the way back to square one to forget:
To forget a training data sample, our approach simply updates a small number of summations — asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
The concept of machine learning rests on a foundation built out of mounds and mounds of information. That can be helpful to teach robots or artificial intelligences to make certain connections — such as if an individual in a heavy coat is wielding an axe, he or she might be a firefighter. But in these training sessions, erroneous connections might arise, based on the data set. Your robot might think that all firefighters have beards. This, obviously, is something you’d want a computer to unthink.
Cao and Yang base this idea of robotic informational uncoupling on the concept of data lineage — that data doesn’t spring fully-formed into the world but has a traceable history as the raw data is processed, notes Kurzweil A.I. Exploiting that lineage allows machines to unlearn select parts of data, without completely wiping their education.