One year ago today, Facebook released its “On This Day” feature. Inverse spoke with Facebook’s Computer Vision Research Lead Manohar Paluri about how artificial intelligence, machine learning, and computer vision make this feature more meaningful — and how these areas of research and development will continue to improve the Facebook experience in years to come.
Even if you haven’t used the On This Day feature yourself, you’ve seen these posts around your News Feed; you’ve seen a friend re-sharing an event from his or her Facebook past. Can’t believe it’s been three years since that magician pulled a rabbit out of a hat! paired with a photo of said magician pulling said rabbit out of said hat. Something akin to that. And today, Facebook’s sharing its own memory. On this day, one year ago, Facebook launched On This Day. (Now, On This Day boasts more than 60 million daily visitors, and 155 million subscribe to its notifications.)
But for Facebook, this memory is less sentimental than it is a milestone. Facebook consistently rolls out new features, and these features are consistently examined and tweaked. Sometimes it’s human beings, like Paluri and his team, who do the tweaking; other times it’s A.I.s. Most times, though, it’s symbiotic. Facebook is like a cyborg, and this cyborg has one raison d’être: to make your Facebook experience as pleasant as possible.
The computer vision, content understanding, and A.I. squad at Facebook could be seen — if you will — as the cyborg’s motherboard. And Paluri, to continue the metaphor, is sort of the central processing unit for that motherboard. Paluri’s been working in computer vision for over a decade, and he’s no small fry: he started at SRI, moved on to IBM Watson labs, and from there hopped over to Google. And now he’s in Menlo Park at Facebook. When he joined, his internship project in visual recognition wound up as the “backbone,” he says, of Facebook’s image and video understanding technology. And that visual recognition engine is becoming more and more central to Facebook.
“If you look at usage of Facebook over time — and this is an example that Mark [Zuckerberg] also quotes often — you see richer and richer media being shared, and people use that to connect,” Paluri says. “You start from text, you go to photos; from photos you go to videos, and from videos we are now going to VR. As the communication medium becomes richer and richer, it’s also important that the tools catch up, that the tools understand what this content is. Unless we have that, we will not be able to do better in News Feed ranking, we will not be able to do better in retrieval of search, we will not be able to do better in describing photos for blind people, we will not be able to build better population density maps.”
The relatively new centrality of artificial intelligence, machine learning, and computer vision, Paluri says, is a bit of a “strategic bet” — but a bet that excites him. Nowhere else that he’s worked has such a tight feedback and response loop between research and engineering. “By centralizing it, we process with the state-of-the-art, we push the state-of-the-art, and then the product teams and the rest of the company can latch onto it,” he says.
Now, Paluri manages the computer vision team. “The high-level goal for the team is to make machines see the way humans do,” Paluri explains. “And go beyond, actually — go beyond what humans are capable of, towards, like, fine-grained recognition, for example. We publish our findings in top conferences, we write technical blogs, and we are very open about what we are working on. Overall, our main goal is to bring computer vision technology to the rest of the product groups at Facebook.”
And the premier product that’s reaping Paluri’s team’s harvest just so happens to be On This Day.
Behind the simplistic, innocent veil that is On This Day lies a complex A.I. and computer vision system that fine-tunes your mnemonic experience. Paluri, who — again — is only tangentially tied to On This Day, explains why reliving social network memories can be a good thing:
“Nostalgia is a very positive phenomenon. So, seeing your wedding photo, for example, in an impromptu way — when you’re not specifically browsing for it, but it just shows up on your News Feed — is an extremely pleasing experience. Especially when you’re browsing in present, and a positive memory comes out from the past.”
“Nostalgia is a very positive phenomenon.”
Yet there is undoubtedly nostalgia that falls more on the bitter side of the bittersweet spectrum. “The first thing that comes to mind,” Paluri says, is: “Should you surface all the memories? The intuitive answer is no, because it depends on your current state, it depends on that specific memory; there are many, many intrinsic things. That’s where the A.I. technology comes into the picture.”
And there are two ways in which the A.I. comes in, here: one, personalization; two, content understanding.
With respect to the latter, content understanding: “These memories are text memories, life events, photos that you uploaded, or videos that you uploaded. So, now you have this plethora of content which is of different modalities, and understanding what is in there is extremely important to be able to learn and provide the right set of memories.”
In addition — and not just for On This Day — content understanding and these A.I. systems help to weed through the overwhelming amount of information that’s on Facebook every day. (Think about it: if Facebook’s News Feed resembled that of Instagram, you’d see maybe two percent of all posts. Instead, you’re met with content that you’ll probably like, or content that you’ll spend a lot of time imbibing.) And it helps filter out objectionable content, like pornography, more than most any other site online.
“Even though it’s a loss, it brings them a positive memory.”
And with respect to the former, Paluri elaborates: “For you, maybe, looking at positive memories is good, and you don’t like anything negative. But for someone else, maybe they want to be reminded of the fact that they lost their cat on this day. Even though it’s a loss, it brings them a positive memory.” And, in a sense, each Facebook user has a highly personalized, behind-the-scenes profile that knows what he or she will or will not want to reminisce about. “As you interact with the memories — as you share, as you like, or as you dismiss — there is a machine learning model that uses the content understanding module, along with your preferences, and personalizes the future memories that will be served to you.”
But don’t fret: Facebook wants to make sure that you’re not rudely reminded of a breakup, or a relative’s passing. “No matter how good the A.I. or machine learning technology is, we will still want to give control to the user, because at the end of the day, our goal is to resurface memories that they like.” Users get an override switch: “If they know that, between these dates, a negative thing happened — they broke up, or something — we want to give them full control on not surfacing those memories.”
Within the preferences for On This Day, then, you can say Don’t show me memories with so-and-so [because he is a despicable human] or …from the past three years [which were miserable and in no way noteworthy].
Looking forward, Paluri explains why he’s thrilled to continue working on developing these systems, and improving the quality of Facebook’s motherboard.
You’ve mentioned other applications already for vision and content understanding systems within Facebook. Is there anything else that’s still in the works — that employs these systems — that excites you?
All these capabilities on videos is something that excites me, for sure. That definitely already exists; it’s an ongoing thing, because video is pretty big on Facebook. But I think, at some level, we want to get richer and richer at understanding it. The current computer vision technology is still not there in terms of describing images the way humans do. It might tell you that this photo has these things, that this is the pixel that belongs to the cat, and so on — but it’s limited. It still doesn’t understand the relationship between things, and it still doesn’t describe it in a human way.
There is some work out there that describes images — it’s called image captioning. There are a bunch of works that came out in the last two years. But, if you look at the captions that these systems generate, they’re very general. They’re not descriptive. One of the things that we would like, and that is going to come in the future from our side, is to describe them in a much richer way. Both for images and video. If you have a two-minute video, you don’t want a one sentence description; what you want is a paragraph with a time-sense to the description, right? ‘This happened, then this happened, then this happened,’ right? That is a good understanding.
So, you’re looking to knock me out of my job, you’re saying. In short.
[Laughs] No, definitely not. I’m making your job more interesting.
Do you feel like Facebook is kind of an odd place for this research to be happening, or is it a perfect place?
I think it’s a perfect place, because content understanding is in the DNA of Facebook. If you look at the explosion of Facebook usage, News Feed is one of the pillars that allowed Facebook to be an amazing social network compared to many other competitors. News Feed, still, is the main distribution channel.
But when you come to News Feed, you don’t come with a specific intent. You come there for information. So, it’s kind of important for us to show you the right things, to show you meaningful things. If you are going to other services, maybe you are going with an intent, in which case all the service needs to do is give the answer. Here, it’s like I am giving you the question and I am giving you the answer. So, you need to be really, really good for somebody to keep coming back.
That’s why A.I. and content understanding is at the core of Facebook, and why this is the best place for it. Given how much media there is — given how much content on Facebook is about images and videos, and the shift towards more and more video and VR — it’s the best place to be doing A.I. research, computer vision, and machine learning.
It’s not an odd place: it’s the place.