The average SAT taker in 2014 answered almost half of the test’s math questions correctly. At a conference in Lisbon on Sunday and through a paper published Monday, researchers from the Allen Institute of Artificial Intelligence and the University of Washington showed off a computer program that can basically perform just as well.
It’s not the first time A.I. programs have been tasked with completing standardized test questions. And it’s not really a big deal that a computer can do math, since hey — almost every computer is capable of doing math (see: Calculators).
No, the reason this program — called GeoS — is impressive is because it was programmed to acquire and interpret the information on the test like a human being would. It wasn’t solving the problems the way an actual calculator might — it was reading the problems off the paper — for the first time ever — like an actual test-taker has to, making sense of all the text and wonky graphs and pictures laid out. GeoS then has to come up with its own problem-solving process, which is basically the program trying to match formulas it already knows to the problem, and figure out what the correct answer is. Like the average person might.
“Our method consists of two steps,” the researchers wrote in the paper. “Interpreting a geometry question by deriving a logical expression that represents the meaning of the text and the diagram, and solving the geometry question by checking the satisfiability of the derived logical expression.”
GeoS was only able to come up with a solution for about half the questions it encountered, and only scored a 500 out of 800 on the SAT math section. Still, that’s only 13 points lower than the average high school senior. And more impressive was the fact that on all the questions it did answer, GeoS had a 96 percent accuracy rate.
“Our biggest challenge was converting the question to a computer-understandable language,” said Allen Institute researcher Ali Farhadi in a press release. “One needs to go beyond standard pattern-matching approaches for problems like solving geometry questions that require in-depth understanding of text, diagram and reasoning.”
You can see and play around with the demo of the system at geometry.allenai.org/demo.
It’s important to emphasize here that the system falls pretty short of replicating actual human problem solving. The system doesn’t employ abstract and common-sense reasoning techniques that people are capable of.
GeoS is still a big step though. The next hurdle will be to develop an A.I. system that successfully cheats through the SAT, raising that 500 by a plum 200 points or so. The new generation better get ready to make friends with incoming freshman A.I. during orientation.