January 21, 2016

HoloLens Interaction Model

Microsoft HoloLens Team

Before we dive into talking about the interaction model, we wanted to share that today is the day we open up Twitter voting to select the winner for the Share Your Idea campaign!

The Share Your Idea campaign has generated over 5000 ideas which are now available to the developer community as an idea bank. If you want to build a holographic app, but aren’t sure what you want to build, this is a great place to go to be inspired by the incredibly creative people who have chosen to participate.

With that out of the way, let’s dig in a little deeper into the ways that you can interact with HoloLens.

There are three key elements to input on HoloLens:

Gaze – What you’re looking at, and how you target
Gesture – An “air-tap” gesture that HoloLens will recognize, and which allows you to drive selection
Voice – Full access to the Windows 10 speech engine

Gaze is one of the most natural forms of interaction that we have. On HoloLens, at any time, the system knows where your head is in space, and what is directly in front of you. We leverage gaze to understand what you’re paying attention to at any given time, and to establish what both gesture and voice should be targeting.

As an example, in the Airquarium finalist, one of the core ideas is that you can look at anything and issue a tap to find out more about the animal that you selected. The way the system understands what to give facts and stats about when you issue a command is from gaze.

Gaze really is that simple, and that powerful. Of course, while you wear the HoloLens, you are generally moving around, so your gaze is also moving. The easiest way to think about it is as having a raycast from the device and which you can determine what object (real world as represented in the spatial mapping mesh or holographic) that ray intercepts with. Once you’re gazing at the animal, and you want to find out more, you need to tell the HoloLens to take an action. This brings us to gesture.

Gesture is the way to take a basic action in HoloLens. Simply raise your hand with your index finger raised, and tap down with your index finger. Now you can target (with gaze), and act (with gesture). We use these two primitives to build up all of the standard interactions that you would expect to have with a screen today, but now freed to interact with holograms embedded into your world.

Voice can really shine when you want to drive deeper interactions. These make voice a powerful input method for HoloLens:

Because HoloLens is a full Windows 10 device, the complete speech engine from Windows 10 is available to you as a developer.
Because the device is head mounted, and we actually understand where your mouth is, we have been able to build array microphones into the device which produce a very high quality audio signal for the purpose of doing speech recognition.
Because gaze is present, you actually have a better user context than is possible to attain for voice driven applications today. It is now possible to understand the object or area of interest which a voice command is intended to target. Because the device can provide this context, the user doesn’t need to try to preface each command with what they’re looking at, which will allow deeper, easier voice driven interactions than have been possible to date.

As an easy illustration, consider the Galaxy Explorer finalist. One of the core ideas is that at any time, you can navigate through the universe using natural voice commands.

Here you establish context (a planet or other feature) based on gaze, and then use voice to actually drive what you want to have happen. The user doesn’t need to explain that they’re targeting a particular world, because gaze is already telling us that. In this way, gaze will combine seamlessly with voice commands to allow you to explore the universe in a way you’ve never been able to before.

It is particularly satisfying to be able to open up voting on the anniversary of our original product announce, when we first got to share a bit about the amazing technology that we’re working on with the world. We’ve been blown away by the level of enthusiasm that all aspects of our program have received, and look forward to continuing to engage the community as we drive towards the launch of the Development Edition. If you’re interested in a trip down memory lane, you can see a bunch of the content that we’ve released over the last year here.

Tags: