Combining human-centered design techniques with leading-edge technologies, new interfaces are moving from keyboards to touchscreens, voice commands, and beyond, transforming the way we engage with machines, data, and each other.
Today, people interact with technology through ever more intelligent interfaces, moving from traditional keyboards to touchscreens, voice commands, and beyond. And even these engagement patterns are giving way to new and more seamless and natural methods of interaction. For example, images and video feeds can be used to track assets, authenticate individual identities, and understand context from surrounding environments. Advanced voice capabilities allow interaction with complex systems in natural, nuanced conversations. Moreover, by intuiting human gestures, head movements, and gazes, AI-based systems can respond to nonverbal user commands. Intelligent interfaces combine the latest in human-centered design techniques with leading-edge technologies such as computer vision, conversational voice, auditory analytics, and advanced augmented reality and virtual reality. Working in concert, these techniques and capabilities are transforming the way we engage with machines, data, and each other.
At a dinner party, your spouse, across the table, raises an eyebrow ever so slightly. The gesture is so subtle that no one else notices, but you received the message loud and clear: “I’m bored. Can we leave yet?”
Most people recognize this kind of intuitive communication as a shared language that develops over time among people in intimate relationships. We accept it as perfectly natural—but only between humans. It seems a bit farfetched—or, at least, premature—that machines might also be able to recognize the intent behind a subtly raised eyebrow and respond in contextually appropriate ways.
Yet in an emerging technology trend that could redraw—or even erase—boundaries between humans and computers, a new breed of intelligent interfaces is turning the farfetched into reality. These interfaces are actually a sophisticated array of data-gathering, processing, and deploying capabilities that, individually or in concert, provide a powerful alternative to traditional modes of human-computer interaction. For example, using cameras, sensors, and computer vision, a retailer can track and analyze shoppers’ store movements, gaze, and behavior to identify regular customers and gauge their mood. By cross-analyzing the information with these customers’ purchase histories, the retailer can push promotions in real time to shoppers’ mobile devices—or, in the not-too-distant future, be able to predict a need based on a customer’s subconscious behavior and preemptively place an order on her behalf.
In this example, the deployed technologies become an intelligent interface between users and systems. And this is only the beginning. Thermal imaging technologies can detect changes in shoppers’ heart rates. A variety of wearables ranging from today’s smartwatches to tomorrow’s augmented-reality goggles capture a wearer’s biofeedback. Smartphone data captured in real time can alert retailers that customers are checking online to compare prices for a specific product, suggesting dissatisfaction with store pricing, product selection, or layout.1
Such potential is fueling a growing demand for a broad range of human-machine interface devices. The global market for speech and voice recognition technologies alone could reach US$22.3 billion by 2024.2 The market for affective computing—another name for emotion-sensing software—is projected to reach US$41 billion in value by 2022.3
During the next two years, more B2C and B2B companies will likely embrace aspects of the growing intelligent interfaces trend. As a first step, they can explore how different approaches can support their customer engagement and operational transformation goals. Companies already on such journeys can further develop use cases and prototypes. Though investments of time, labor, and budget may be required before companies can begin reaping benefits, the steps they take during the next 18 to 24 months will be critical to maintaining future competitiveness.
Intelligent interfaces represent the latest in a series of major technology transformations that began with the transition from mainframes to PCs and continued with the emergence of the web and mobile. At each stage, the ways in which we interface with technology have become more natural, contextual, and ubiquitous—think of the progression from keyboards to mice to touchscreens, to voice and the consequent changes in the way we manipulate onscreen data.
Today, voice-user interfaces such as those found in popular mass-market products such as Amazon’s Alexa, Google Assistant, Apple’s Siri® voice recognition software, and Microsoft’s Cortana are the most widely deployed type of intelligent interface. The ongoing competition among these tech giants to dominate the voice systems space is standardizing natural language processing and AI technologies across the interface market—and fueling innovation.4 Amazon offered a US$1 million prize through its annual Alexa competition to any team of computer-science graduate students building a bot capable of conversing “coherently and engagingly with humans on popular topics for 20 minutes.”5
Voice use cases are proliferating in warehouse, customer service, and, notably, in field operation deployments where technicians armed with a variety of voice-enabled wearables can interact with company systems and staff without having to hold a phone or printed instructions. Likewise, we are seeing more organizations explore opportunities to incorporate voice dialog systems into their employee training programs. Their goal is to develop new training methodologies that increase the effectiveness of training, while shortening the amount of time employees spend learning new skills.
Though conversational technologies may currently dominate the intelligent interfaces arena, many see a different breed of solutions gaining ground, harnessing the power of advanced sensors, IoT networks, computer vision, analytics, and AI. These solutions feature, among other capabilities, computer vision, gesture control devices, embedded eye-tracking platforms, bioacoustic sensing, emotion detection/recognition technology, and muscle-computer interfaces. And soon this list also may include emerging capabilities such as brain-controlled interfaces, exoskeleton and gait analysis, volumetric displays, spatial computing, and electrovibration sensing.
To understand how these capabilities could work in concert in an enterprise setting, picture a widely distributed array of IoT sensors collecting data throughout a manufacturing facility, and streaming it rapidly back to a central neural system. In many cases, these sensors function like a human’s senses by visually, haptically, and acoustically monitoring operational environments. For example, microphones embedded in assembly-line motors can detect frequency changes. Or computer vision monitoring those same motors can “see” a malconfigured part. Enter AI algorithms—acting as a logic-based brain—that derive inferences from the data generated by these and other sensors. The brain may infer that a specific assembly-line function is underperforming, and based on that identification, the brain/AI component of an intelligent suite of interfaces can respond. Moreover, by collecting, for example, manufacturing variances in real time versus in batches, the system can accelerate response times and, ultimately, increase operational throughput.
To be clear, skilled human observation, combined with machine data, still delivers the most robust and impactful understanding of manufacturing processes or retail operations. And with intelligent interfaces, the flow of information between humans and machines runs both ways (see figure 1). As we have examined in previous editions of Tech Trends, augmented reality (AR), virtual reality (VR), and mixed reality devices—which act as delivery vehicles for intelligent interfaces—are drawing upon a wide variety of data to provide users information-rich, contextually detailed virtual environments.6 This represents a fundamental reordering of the way that people have traditionally used technology. Rather than being the beginning state of the human-machine interface, we are now the end state.
Intelligent interfaces offer B2C and B2B opportunities in several areas:
Any intelligent interface initiative involves underlying technology capabilities to bring it to life. As the fidelity and complexity of these experiences evolve, those foundational elements become even more critical. If you are collaborating with a colleague in a virtual environment via a head-mounted display, a 50-millisecond delay in a spoken conversation is annoying; if you find yourself waiting a full 10 seconds for a shared visual to load, you will probably lose confidence in the system altogether. Developing the supporting infrastructure necessary to harvest, analyze, and disseminate infinitely more data from more input sources will make or break experiences. There are also data syndication, capture, storage, compression, and delivery considerations, and this is where having an IT strategy for managing the backbone elements of intelligent interfaces will be crucial.
An effective strategy for prioritizing data, breaking it apart, processing it, and then disseminating to systems and network devices should include the following considerations:
Despite the potential of AR to entertain and educate the masses, a barrier to widespread adoption has been developing an interface that is accessible, nondisruptive, and intuitive to use. Snap has found a way to attract hundreds of millions of daily users to its app using AR technology as a hook.
“Ultimately, we’re a camera company, but we focus on creating new ways of communicating through an intelligent and connected camera,” says technology VP Steve Horowitz.8 “Snap has leveraged AR to build a close and loyal relationship with our users; in turn, we’re constantly learning about how people interact with their devices, so we can evolve the Snap experience based on their natural interactions.”
Most of Snapchat’s users probably don’t think much about interacting with AR when they layer mouse ears onto their head, dub a higher voice tone, or add dancing llamas to a video’s background. This is because Snap’s choice of a smartphone’s camera as the interface is a familiar, comfortable tool that is almost always within reach. Snap adapted an interface that complemented users’ natural movements—leveraging sophisticated facial mapping and computer vision technology that creates 3D animations that rotate and expand. Yet the “lenses,” or filters, are easily accessible to users as they intuitively move the phone, point, and click. There is virtually no learning curve in creating, sending, and viewing a snap, and the result is immediate.
And Snap has been working with market leaders to change the boundaries of digital engagement—helping to make interactions seemingly effortless for consumers. These experiences combine digital reality technology with a cloud-based e-commerce platform and on-demand fulfillment. For example, customers can view products before they are released at geofenced events or virtually “try on” limited-edition merchandise with geofilters, make purchases without leaving the app, and have them delivered the same day.
What’s next for Snap? Technically, the company believes that cameras won’t stop evolving with the smartphone but will be incorporated into less disruptive tools such as camera glasses and other yet-to-be-invented devices. For engagement, Snap plans to continue to shape the future by delivering intuitive and creative AR experiences to their users.
Delta Air Lines made headlines in late 2018 by opening the United States’ first-ever terminal where passengers can get from the curb to the gate using only their face as proof of identity.9 Travelers flying out of Atlanta’s Maynard H. Jackson International Terminal F, direct to an international destination, can check in at kiosks, drop baggage at lobby counters, pass through security checkpoints, and board their flight using facial recognition technology.
The airline’s initial findings suggest that facial recognition decreases passenger wait times and can shave nine minutes off the time it takes to board a wide-body aircraft.10 The launch builds on Delta’s biometric boarding tests at ATL, Detroit Metropolitan Airport, and John F. Kennedy International Airport over the past several years, and the airline’s second curb-to-gate biometrics experience will expand to DTW in 2019. The airline hopes that implementing biometrics—including fingerprint, along with facial recognition—will improve and expedite the travel experience. In addition, Delta aims to improve customers’ interactions by streamlining operations and empowering employees with technology so they have more time to engage with customers more meaningfully.
“We want to leverage high-tech to drive high-touch,” says Matt Muta, Delta’s Operations Technology & Innovation vice president.11 “Our employees are our greatest competitive advantage, so a big part of our approach is to empower them with tech tools that boost their ability to deliver an empathetic, seamless travel experience—that’s the Delta difference our people are known for around the globe.”
Muta works with operational teams throughout Delta’s divisions, and at the airline’s midtown Atlanta global innovation center called The Hangar—a startup-like setting that tests and scales proven technologies.12 The team uses a design-thinking process, which helps them dig in and understand the problem, quickly model, and deploy. To understand travelers’ and employees’ challenges, team members engage heavily with Delta’s employees, business, and technology partners across the organization.
Since its inception in 2017, the Hangar team has explored ideas, including how to use technology to help Delta One customers select meals before their flights; interactive voice solutions that offer travelers flight information; real-time flight communication devices for pilots, flight attendants, and gate agents; a gate interface allowing agents to perform tasks without a PC so they could be more mobile; a suite of technologies to study traffic in Delta’s Sky Clubs; and drone-enabled lightning inspections.
Within three years, Muta says, Delta will explore more technologies to intelligently interact with customers and employees, to help Delta better engage through the travel experience by further mobilizing the workforce, and promoting consistent messaging. Delta’s people and their renowned service will remain at the core of the airline’s success as it explores how to leverage more applications of artificial intelligence—including computer vision, machine learning, predictive analytics, and optimization. Muta is confident that the way Delta is approaching innovation and leveraging biometrics and facial recognition will set a standard not just for Delta but for the industry as a whole. As for the Terminal F biometrics, Delta is drawing the industry’s blueprint for the biometric customer experience of the future while capturing customer and employee feedback, refining and retooling processes as it scales out its intelligent interfaces in new locations.
We’ve seen a lot of media coverage on artificial intelligence in the last few years, often focusing on how the technology might cost people their jobs. But I think a much more exciting possibility is a future in which people are augmented with intelligent interfaces—thereby elevating and combining human decision-making with machine intelligence. At the lab, we like to talk about intelligence augmentation rather than artificial intelligence, and we view the future of interaction with our devices as one that is more natural and intimate.
There are three ways in which we would like to see our devices change. First, we live in two worlds now—the physical and the digital—and they’re not well integrated. We are constantly forced to multitask and shift our attention from one to the other. Second, while today’s personal devices provide access to the world’s information, they don’t do much to assist with other issues that are important for being successful, such as attention, motivation, memory, creativity, and ability to regulate our emotions. And third, our devices today pick up on only the very deliberate inputs that we give them through type, swipe, and voice. If they had access to more implicit inputs such as our context, behavior, and mental state, they could offer assistance without requiring so much instruction.
Today’s devices are aggregating more and more information about users. But in the future, they will also gather data on the surrounding environment and current situation, perhaps by analyzing what we are looking at or sensing what our hands are doing. This context will enable our devices to provide us with data based on explicit intent and well-defined actions, as well as our state of mind, unspoken preferences, and even desires. These systems will gain an increased awareness about the user and their context and will form predictions about the user’s behavior and intentions.
Devices will be able to learn from their interactions with us, which over time will yield much more efficient decision-making and communication between human and device. I often joke that the device of tomorrow will know each of us better than our spouse, parents, or best friends because it will always be with us, continually monitor us, and be able to detect even subtle cues from our behavior and environment. Are we focused or absent-minded? What is our stress level? Are we in physical discomfort from a medical condition or injury? All these factors very much affect engagement but are almost impossible to quantify without improvements in sensing and understanding of the contextual signals around us. Current interfaces such as a computer keyboard or mouse do not adjust automatically to those kinds of cues.
To optimize the delivery of data, interfaces, as we know them, must evolve. Today, receiving information from devices is disruptive: The user needs to stop what they’re doing in order to receive the message, decide what to do with the information, and then indicate to the phone, tablet, or laptop what they would like to do next with a keystroke, swipe, or voice command. Using olfactory, visual, and auditory display technologies, as well as electrical and vibration stimuli, devices will be able to communicate with us in ways that do not require our full attention. We can perceive and process stimuli—such as smells or sounds—while we focus on the document we’re typing or the TV show we’re watching—without deliberate rational thinking.
Our goal at the lab is not only to enable seamless input of all manner of user data into devices but to enable users to act on the data and insights the devices provide in a way that is intuitive, nondisruptive, and comfortable. We need to create methods that will enable the user to accomplish certain tasks with minimal effort, time, and difficulty. We’re searching for more subtle ways to provide information without distracting users from what they’re doing, and that requires exploring the potential of all five senses.
Our Essence project, for example, explores the use of olfactory cues. The small clip-on device senses, perhaps, the declining alertness of the wearer during a meeting, and emits a burst of lemon or peppermint, scents demonstrated to increase attentiveness in humans. The intensity and frequency of the scent are based on biometric or contextual data. In another of our projects, AlterEgo, a wearable, peripheral neural interface, allows humans to “converse” with machines without using their voice and without “unplugging” from their environment. The device senses subtle signals when the user internally articulates words, without voicing them, and then sends audio feedback via bone conduction without disrupting the user’s ability to hear their surroundings. One of my students is even studying the validity of the so-called “gut feeling” by monitoring changes in stomach activity as an indicator of unconscious feelings or mental status.
Our devices are so much a part of our lives, but there is still a long way to go to fully and seamlessly integrate them in our lives. The opportunities for cognitive augmentation are tremendous, and the first step to exploring those possibilities is creating more intelligent, intuitive interfaces.
As I see it, the future of entertainment is a world where storytellers—big and small—are empowered to tell stories that couldn’t be told before. It’s a future in which we get to share the best practices Hollywood has to offer and allow storytellers from around the world to make their own dreams a reality. It’s a future with not just one centralized voice but many important disparate voices that haven’t been heard before. It’s a future where Innovation Studios empowers and disrupts the known industry to enable what was once impossible.
Innovation Studios, Sony Entertainment & Technology, was born out of Sony Pictures Entertainment and opened in June 2018. We have taken up residence in a 7,000-square-foot soundstage on the Sony Pictures studio lot and are using Sony’s latest research and development to help storytellers around the world create content for today and the future.
This is incredibly important because somewhere around the world is a voice that yearns to be heard. Somewhere in Amman, Jordan, is a young woman with a story ready to share. Somewhere in the mountains of Morocco is a location yet to be seen. From Hollywood to Moscow are millions of magnificent places that speak to so many people.
What if we could collapse these great distances and reproduce the reality of each place in the most realistic of 3D experiences? What if we could film in—and feel like we are living in—these worlds ... while never having to leave our Culver City, California, stage? Well, we can. Volumetric video technology with quadrillions and quintillions of data points—much like the atoms that make up you and me—allows us to create more than 30,000-pixel resolution and moving environments captured on 6K Sony Venice cameras. We can film live performers in virtual worlds with traditional motion-picture and television cameras.
Now you’re starting to see the potential of Innovation Studios to create virtual worlds with people in a multitude of locations while never leaving our volumetric stage. Volumetric storytelling offers filmmakers a realistic immersive experience of any object in each space, from any viewpoint employing a parallax that reacts like the physical world. The technology we’re using allows for real-time visual effects so the real fuses with the unreal.
When we can capture the analog world synthetically in a resolution that is beyond anything possible with today’s cameras, it gives us the opportunity to do more than entertain—we can preserve and protect monuments and locales, to celebrate humanity and the Earth. This technology also offers inherent cost savings to the industry: Rather than spending millions to send a cast and crew of hundreds to a location to film for weeks or months, or to rebuild sets for a blockbuster sequel, we can send a small crew to a location or set to shoot the images and preserve them forever for reuse.
I’m a firm believer that we should celebrate the US$200 million film and all the technical prowess that goes into making it, but I’d like to be able to say that I helped create a world where that kind of innovation wasn’t limited to big-budget films. I’d like anyone who is making any form of content to have access to these capabilities. Technology shouldn’t be for the best-financed content creators—it should be for all content creators, because in the end, we all benefit from the stories others tell.
We see the potential value in this technology not just for the next generation of filmmakers but for equipment manufacturers, governments, health care providers, educators, the aerospace industry, art dealers, and museums. We’re working now with engineers throughout Sony to pursue opportunities to eventually partner with other industries that could benefit from leveraging the technology.
We’re working on technology that saves money, increases opportunity, expands horizons, and enables dreamers. The words “What if?” used to be the two most expensive words in the film industry, but now they’re the most cost-effective words. If you have the asset, you can say, “Sure, let’s do it!”
Today, data is the currency that runs through a digital ecosystem, and interfaces are the tools that help us interact with that data. Enterprises are recognizing the many use cases for intelligent interfaces in mitigating cyber risks in their systems and networks. Yet as the deployment of intelligent devices grows—in airports, health care, schools, manufacturing, and retail—organizations need to consider the potential cyber risks they pose to users and the organizations that host them. Companies should implement appropriate security measures with respect to accessing the interface and, subsequently, the data collected from and sent to the interface.
Intelligent interfaces help mitigate cyber risk in a variety of applications and across multiple industries. Take the use case of biometrics: The benefits of facial recognition, retinal scans, and fingerprints as identification in airport security checkpoints and border protection, for example, are obvious. Using biometrics unique to one person provide a more reliable, accurate, and expedited validation of identification and citizenry, thereby increasing the safety of the general public. These same characteristics make biometrics the de facto method of securely accessing smartphone devices. Because biometrics are irrevocable—you can’t easily alter the patterns of an iris or characteristics of a fingerprint—they don’t require upkeep like a password and are far more difficult to steal.
However, because of this unique permanence, the repercussions of biometric data being compromised could be devastating: The same identity markers used to make access more secure and efficient, in the event of a breach, create multiple layers of risk. These can include:
To address these cyber risks, organizations should establish data governance models from the beginning that define the value of their data, its ownership, where it resides, and how it will be used—and that help ensure they are making ethical decisions in the use and destruction of that data.
Additionally, organizations should put in place a data risk management program that identifies the value that data has to bad actors, as well as sensitivities and vulnerabilities of the data they are consuming and processing through these interfaces. It is critical to put controls in place so that high-value data is secured no matter where it is located: on premises, at a remote data center, or in the cloud. Companies should anonymize information if it’s irrelevant to the use of the data, establish additional boundaries for handling data that is transferred externally to third parties, and, finally, put thoughtful consideration into data deletion and archival.
The intelligent interfaces trend represents an opportunity to use converging exponential technologies to understand customers more deeply, enhance operational efficiency, and create highly personalized products and services. But as every CIO knows, the devil is in the details. Some industries, such as retail and manufacturing, are currently at the vanguard of intelligence interfaces adoption. Other industries? Time will tell. As you explore the opportunities and potential pitfalls of intelligent interfaces, consider the following questions:
My budget is limited. How can I show a return on this kind of investment sooner rather than later?
Intelligent interface initiatives will almost certainly require investing in hardware such as sensors, head-mounted displays, and microphones. These are not as-a-service plays—they are hard costs. And while an individual sensor is inexpensive, adding up the number of sensors needed to monitor a manufacturing facility may suggest a very different level of investment. While this may give you pause, you should avoid falling into the common trap of viewing IT solely as a cost center. CIOs often view initiative costs linearly: Investments in emerging technologies will deliver specific outcomes within certain time periods. Perhaps a more effective—and accurate—approach might be to look more broadly at how investment costs could amortize across operations and returns. In today’s interfaces marketplace, there is a gap between what these nascent technologies truly do and how their costs are justified. As you explore opportunities in your company, think critically about what you want to achieve, how this trend may help you achieve it, and about the level of commitment you are willing to make.
What skill sets will I need?
With intelligent interfaces, human bodies become the instruments for creating commands, with users wearing devices on their bodies that constantly track movements, voices, and gazes. For this reason, human-centered design skills will likely be more important than ever to IT organizations. For example, people with medical backgrounds understand the way bodies function and process stimuli. Linguists might offer insight into what constitutes an effective voice conversation for humans, and what humans would respond to in terms of a computer-generated response. Physical therapists could bring specialized expertise to the development and use of haptic technologies.
In addition to those with human-centered skills, technology generalists or “connectors” will also play an important role going forward. These are people who deeply understand all intelligent interface technologies and, more broadly, how they interact and interface with each other. They will be able to understand how to deploy these technologies in combinations to fuel the levels of growth that the trend promises.
At a fundamental level, the intelligent interfaces trend involves understanding the behaviors of customers and employees in much greater detail than ever before. Should I be concerned about privacy issues?
Yes. In terms of privacy, tracking users’ online behaviors is one thing. Tracking both online and offline is a fundamentally different proposition—one that many users may find invasive and unacceptable. How this will play out in terms of regulation remains to be seen. But for now, companies can take steps to make sure all interfaces—particularly those, such as AR or VR gear, that are designed primarily for the consumer market—are deployed consistently with enterprise standards of privacy and security. For example, when using headsets in the workplace, you don’t want to capture co-workers’ faces for extended periods of time, so users need the ability to activate the headsets only when necessary. This same consideration applies to voice interfaces: How do you determine what conversations should or should not be recorded? Microphones in popular virtual assistants in the consumer market are always on, which may be acceptable in some enterprise deployments but surely less so in retail or home settings. As part of any intelligent interfaces deployment, it will be necessary to put checks in place before data is gathered or processed to help ensure that individual privacy is respected consistently.
There are so many hardware and software players in this space right now. Should I wait until standards and a few dominant platforms emerge?
No. The solution space is fragmented, but growing numbers of companies—some of them likely your competitors—are developing use cases and exploring ways that the intelligent interfaces trend might add value. You can follow their lead or develop your own in-house use cases. Either way, your efforts can and should be contained in an innovation portfolio where the costs are understood as research. Your programs can be quick-cycling, seek immediate user feedback, and can be catalogued as impactful or not (and ideally tied to KPIs that can be identified and projected). You can then measure and evaluate these efforts with a go/no-go. Of course, developing use cases and exploring opportunities is only one piece of a larger digital transformation puzzle. Bigger picture, your company needs a coherent innovation strategy that incorporates rapidly evolving, fragmented ecosystems and unproven use cases today and in the future. In the end, nimble innovation makes it possible for companies to try, fail, and learn.
Unlike many technology trends that present new ways to streamline processes or engage customers, the intelligent interfaces trend offers something much more personal: an opportunity to fundamentally reimagine the way we, as humans, interact with technology, information, and our surroundings. To say this trend is potentially disruptive would be an understatement—simply put, it represents the next great technology transformation. And this transformation is already underway. If you are not exploring the role that voice, computer vision, and a growing array of other interfaces will play in your company’s future, you are already late to the game.