Dr. Jose Hernandez-Orallo is smart. The Technical University of Valencia data science professor is also an author — he just released a new book entitled The Measure of All Minds: Evaluating Natural and Artificial Intelligence — but he might be dumber than a toaster. It’s hard to say, because who’s to say how smart a toaster is? The answer, as of now, is “no one.” But Hernandez-Orallo is not content with that status quo. He believes that we can and will reconsider human, animal, and artificial intelligence as we better understand the emergent traits that, taken together, constitute being smart.
Inverse spoke to Dr. Hernandez-Orallo about the future of intelligence testing, how engineers would know if they created true A.I., and why robots are easier to understand if you think about rodents than if you think about human beings.
There’s a lot to unpack here so let’s start big. What’s the overarching theme of your book?
To bring some of the ideas from natural intelligence evaluation to the realm of artificial intelligence evaluation. It also brings some ideas — some news ideas based on algorithmic information theory — to the realm of psychometrics and animal evaluation.
The book tries to raise new questions and suggest new areas for future research.
Can you give me a little background thinking on why it is that an A.I. probably wouldn’t think, at least at first, in a very human-like way, if humans are creating these A.I.? Why wouldn’t human intelligence tests be the appropriate measure?
A.I. is very successful, but even today with that, with the relevance to learning, we still have a lot of very specific systems that solve very specific tasks. At the moment, we can properly evaluate whether a system solves the task, but we are also seeing a new generation of A.I. systems becoming adaptable. They learn from examples. They don’t need an engineer focused on specialization.
These machines are going to be not only calculators but also something more sophisticated that will extend our abilities beyond what we are able to do today. That means that we are witnessing a lot of new systems that we need to evaluate in a different way — or a new way because the systems are new.
You discuss why machines do poorly on intelligence tests. What’s the shortcoming, or is it more of a translation issue?
IQ tests have been developed for 100 years for humans, and they basically measure the variance in the human population. But when you go to computers, what they measure is basically meaningless because machines are not part of this population. The thing is that there have been several attempts for systems trying to pass IQ tests that have been relatively successful. I just think that most people nowadays recognize that acing an IQ test is not a sufficient test for intelligence in machines.
You’re just specializing them for the test.
Right, I remember one, maybe two years ago there was a story about an A.I. that could scan a regular SAT test and understand most of it — or the math portion anyway — to the point that it could get a passing grade. It was interesting, but really more of a test of engineers’ ability to teach it to read diagrams.
IQ tests have been designed so that in about one, two hours you can have a good assessment of what is called general intelligence in some of the abilities in humans. For some tests, you can very easily construct a system that can pass them. For some of them — maybe for all of them in the future — you will be able to build some system that passes these exercises and gets good scores. But those systems are not fully intelligent in the way that we understand or we expect.
This is kind of the core question, but, I mean, what would an appropriate IQ test for an A.I. actually look like?
What we are interested in, instead of an IQ test or score for machines, is some kind of cognitive profile. We can have a machine and we can evaluate the cognitive ability of that system and, on occasion, we might also be interested in some personality traits of that system. That would be a cognitive profile or psychometric profile of a system. We are talking about general systems, not about a self-driving car. It doesn’t make sense to talk about the cognitive abilities of self driving car, which is a system that is very specifically built for one task.
Basically, the idea is that in the future, for some A.I. systems, we would be more interested in being able to define tests that can just get that cognitive profile and we can compare systems according to these cognitive profiles. For instance, in the future, we may have two systems and we can say, okay, this system is a much better basic learner but this other system is better at planning or this other system is better, for instance, in natural language understanding or things like that. Not in terms of specific tasks, but in terms of abilities.
How does this all compare with animal models?
I think with the level of A.I. at this moment, that it is more interesting to look at. How are animals evaluated? How can we evaluate cognitive abilities in rats and robotic rats in terms of the ability to learn, or the ability to plan? There has been some research with this comparison, not in terms of abilities but in some particular tasks.
It is very difficult to evaluate or to compare to animals in terms of the tasks that they do. What we can do is see whether they’re able to solve some kinds of tasks in terms of abilities. And that’s, of course, a difficult thing because we have to arrange the space of abilities. But this has been done in psychometrics from experimental data and it has also been done in comparative psychology for animals. There are many theories but there’s some kind of agreement about how to go from very specific abilities to more general abilities and, in the end, general intelligence.
It’s interesting you bring up animals when we talk about this general adaptability. It’s striking that there are some lower-order animals that probably wouldn’t hit that threshold, like certain reptiles. If you were to drop them into an unfamiliar situation, they would be basically incapable of dealing with it. I mean, from that perspective, are computers starting to approach the level of adaptability and general intelligence of those lower-order animals?
Yeah, I think that the parallel is quite accurate. It is difficult at this moment because we don’t have very precise evaluation tests or metrics. But there are people who say we are at the level of an insect. Some other people say that we are the level of a reptile. Of course, we are not at the level of a mammal, in terms of A.I.
Still, I think it’s interesting to consider evolution. Evolution is based on very specialized systems that don’t learn at all but adapt to an environment. When you take an animal from an environment, it is basically hopeless in the new environment. That’s the same thing we see with some A.I. systems. They are not versatile. They are not adaptable. They basically break when you just change a little bit of the environment, just shapes or even the way that the problem is presented. But some other systems, some of the animals, they have developed this adaptability. Not physical adaptability but cognitive adaptability. We have seen that. It’s interesting in animals.
At the moment, we don’t have that yet in A.I., but that will come. I think it is a good parallel of the things that we will see in the future. We can talk about this ability-oriented evaluation in the machine kingdom, or in A.I., basically.
We’re talking about specifically creating intelligence tests for inhuman intelligence, but it’s a human intelligence that is coming up with it. What are the implications of that contradiction, as far as difficulty and how we might expect that process to progress?
As we try to adapt an IQ test or ‘anthropocentric’ test to machines, we are going to fail in many ways. Of course, we can try to see or to devise tests about how similar machines are to humans. Some kind of measurement of humanness, whether a cognitive profile for a machine is similar to a human cognitive profile. We could do that.
What we can do is instead of deriving the difficulty of a task or an environment from a population of humans or population of animals, we define that notion theoretically. By doing that, we don’t depend on a population and since we don’t depend on a population, we can apply that concept to machines and humans and animals. In the end, we can know where humans really are in this space and not because they are in the center of the universe, but rather because they are just part of more general systems.
But I don’t think that we’re going to have good benchmarks for the cognitive abilities of machines in the next five or ten years. I think what we can have in about five or ten years is something better than what we have today.
When people think about these sorts of generalized artificial intelligence, that’s pretty much the threshold of where people start to get really worried. Is this going to be something that will be used to create a standard that we try to stay away from? That if we have an ability to see what general intelligence would actually be, and how we might get there, could that become just a list of things to actively avoid?
Yeah, I think that’s a very good question because the very intelligent system can always conceal its intelligence. Basically, you can always underperform for a test. In terms of detecting a dangerous system, I don’t think that cognitive evaluation is very useful. I don’t really think that we can have some kind of psychiatry or something like that about A.I. systems, or know what they are going to get wrong in terms of evaluation.
People are talking about an exponential growth of super-intelligence or an exponential growth of intelligence, maybe. One cannot properly talk about that if you don’t have a measure of that. Basically, we can think that this doubles every ten years, but [we] don’t have a real metric about what is doubling.
So, could this general A.I. be dangerous?
Yes, and there is an increasing research effort in A.I. to prevent this from happening. A better understanding of what intelligence is and how it can be measured can enormously contribute to this effort. For instance, any notion, any condition of intelligence, is linked to resources. So, computation of resources, in the end, is linked to energy. One way of limiting A.I. is limiting energy consumption, or computational resources. That can be done.
We can only talk about a system controlling all other systems if we have a good understanding of the scale of intelligence and the relationship between intelligence and other cognitive abilities and domination. I don’t think that having a system ten times more intelligent than another system — if we can put a unit on it — is going to mean that the first system will dominate these other systems.