Every year, my extended family gets together for a picnic at which we always take a group picture. And every year, the shot is ruined by something in the background: a street lamp sticking out of my aunt's head, strangers caught in the flash, cars intruding into peaceful outdoor settings.
But with a few clicks in Photoshop, I can easily remove that street lamp without decapitating my aunt. Algorithms analyze the photo, determine how it would look if some object weren't there, and then--voilà!--the image is fixed.
This trick seems simple. But it's an example of sophisticated machine-learning algorithms at work, and a signal of what's just over the horizon. A slew of new technologies will soon fill in the blanks not just in our photos, but in many other places, too. They will automatically tell us who and what we're seeing, and they'll anticipate people's reactions in a given situation. Recognition algorithms--which analyze an image of, say, a tennis racket and identify it--will eventually be integrated into the devices consumers use. For good or ill, they'll also be built into behavior-monitoring systems available to businesses.
Within the next 24 months, consumers will have rudimentary access to such technologies. Pinterest recently launched Lens, a sort of Shazam for objects; Blippar, which Inc. wrote about last month, has similar capabilities. Samsung's upcoming Galaxy S8 will allow a mobile-phone camera to be used as a visual search tool, too--and this on-the-fly recognition sets the groundwork for where we're headed tomorrow. Today, Facebook, Apple, and Google auto-generate movies from content you upload. In future iterations, completion algorithms will pull footage from co-workers and social media connections, stitch it together with your own, and automatically create videos from events like work retreats and family picnics.
Meanwhile, researchers at MIT's Computer Science and Artificial Intelligence Lab have developed an algorithm that can forecast how humans will interact with one another. Capture two people in a video frame, and it can predict whether they will shake hands, hug, kiss, or ignore each other. CSAIL has also developed and trained a deep-learning algorithm that can recognize human activity well enough to generate full videos from single images. Using a photo of a trainer, a dog, and a jockey riding a horse, a computer auto-completed a video of the trainer leading the horse into position--with actual people, trees, grass, and animals--to convincingly depict a scene that never happened in the real world. Such developments imply a near-future full of fascinating applications, in which smart algorithms produce videos that foresee, for example, how kids from Atlanta might respond to a new potato chip flavor, or how older Harley riders might react to a near-accident.
What all of this points to: Machine vision will soon be powerful enough to use our past behavior to predict what we'll do next, making it easier to observe crowds at events, monitor employees at work, and track customers as they shop.
By then, algorithms will focus not just on what, but on who, as well, by pairing our faces to the data we generate on social networks and mobile apps. Smart cameras will detect a customer who's likely to engage with your company's sales rep if approached, and then feed that rep key data points about that customer in real time. The privacy concerns such technologies stir may well make us all feel a little vulnerable. But perhaps the eventual killer app won't be a nostalgic video maker, but rather a real-time information system that can help us relate better to one another. Which could ultimately do more for all of us than fixing an annual family photo.