In a WWDC 2020 developer session called "Detect Body and Hand Pose with Vision" led by engineer Brett Keating, Apple revealed significant updates to its Vision framework. Developers for iOS 14 and MacOS Big Sur will be able to leverage more detailed hand and pose detection. Though not immune to errors, this joint-based detection allows for more advanced gesture triggers and body tracking.
What to expect — Both hand and body pose detection significantly relies on various joints and other points. Hand pose detection covers all of a hand’s joints as well as a singular wrist point and the tips of all the digits. Keating demonstrates a potential use case by touching the tips of his index finger and thumb to draw the word “hello” in the air as though using a stylus.
Other examples include using gestures to trigger a timer for taking a photograph and generating emojis to match real hand poses.
Body pose is slightly less intricate, but it still covers all the major joints and a few essential head movements. This movement detection allows for fun video editing, fitness tracking of reps or proper form, and safety training for jobs involving manual labor. It could also be used to search for videos based on activity, like dancing or working out.
Both forms of pose detection also support tracking multiple hands and people. While tracking multiple people might not be at the top of developers’ lists in a socially-distanced world, it could eventually inform group fitness activities. It could also be useful to editors making music videos or movies, or could simply finally make it possible to get a successful action group shot, like multiple people doing handstands, cartwheels, or jumping.
It’s not perfect — Hand pose detection has some trouble with people wearing gloves, hands near the edge of the screen, and hands parallel to the viewing angle. More perplexingly, feet are occasionally recognized as hands. Body pose tracking also has trouble when a person is too close to the screen’s edge in addition to when people overlap or are obscured by other objects. Other possible pain points include people who are bent over, upside down (okay fine, no handstands then,) or wearing loose-fitting clothing. That's not going to be a problem for the yoga pants brigade or athleisure crowds, though.
While the technology has a long way to go, it's only in the early stages, so its potential applications for AR, fitness, or video creation is only just beginning to be realized. In a generation or two of iOS releases, it'll doubtless be far more refined. Which is more than can be said for our cartwheels.