Skip to main content
April 18, 2017
PC

Building a Telepresence App with HoloLens and Kinect



When does the history of mixed reality start? There are lots of suggestions, but 1977 always shows up as a significant year. That’s the year millions of children – many of whom would one day become the captains of Silicon Valley – first experienced something they wouldn’t be able to name for another decade or so.

The plea of an intergalactic princess that set off a Star Wars film franchise still going strong today: “Help me Obi-wan Kenobi, you’re my only hope.” It’s a fascinating validation of Marshal McLuhan’s dictum that the medium is the message. While the content of Princess Leia’s message is what we have an emotional attachment to, it is the medium of the holographic projection – today we would call it “augmented reality” or “mixed reality” – that we remember most vividly.

While this post is not going to provide an end-to-end blueprint for your own Princess Leia hologram, it will provide an overview of the technical terrain, point out some of the technical hurdles and point you in the right direction. You’ll still have to do a lot of work, but if you are interested in building a telepresence app for the HoloLens, this post will help you get there.

An external camera and network connection

The HoloLens is equipped with inside-out cameras. In order to create a telepresence app, however, you are going to need a camera that can face you and take videos of you – in other words, an outside-in camera. This post is going to use the Kinect v2 as an outside-in camera because it is widely available, very powerful and works well with Unity. You may choose to use a different camera that provides the features you need, or even use a smartphone device.

The HoloLens does not allow third-party hardware to plug into its mini-USB port, so you will also need some sort of networking layer to facilitate inter-device communication. For this post, we’ll be using the HoloToolkit’s sharing service – again, because it is just really convenient to do so and even has a dropdown menu inside of the Unity IDE for starting the service. You could, however, build your own custom socket solution as Mike Taulty did or use the Sharing with UNET code in the HoloToolkit Examples, which uses a Unity provided networking layer.

In the long run, the two choices that will most affect your telepresence solution are what sort of outside-in cameras you plan to support and what sort of networking layer you are going to use. These two choices will determine the scalability and flexibility of your solution.

Using the HoloLens-Kinect project

Many telepresence HoloLens apps today depend in some way on Michelle Ma’s open-source HoloLens-Kinect project. The genius of the app is that it glues together two libraries, the Unity Pro plugin package for Kinect with the HoloToolkit sharing service, and uses them in unintended ways to arrive at a solution.

Even though the Kinect plugin for Unity doesn’t work in UWP (and the Kinect cannot be plugged into a HoloLens device in any case), it can still run when deployed to Windows or when running in the IDE (in which case it is using the .NET 3.5 framework rather than the .NET Core framework). The trick, then, is to run the Kinect integration in Windows and then send messages to the HoloLens over a wireless network to get Kinect and the device working together.

On the network side, the HoloToolkit’s sharing service is primarily used to sync world anchors between different devices. It also requires that a service be instantiated on a PC to act as a communication bus between different devices. The sharing service doesn’t have to be used as intended, however. Since the service is already running on a PC, it can also be used to communicate between just the PC and a single HoloLens device. Moreover, it can be used to send more than just world anchors – it can really be adapted to send any sort of primitive values – for instance, Kinect joint positions.

To use Ma’s code, you need two separate Unity projects: one for running on a desktop PC and the other for running on the HoloLens. You will add the Kinect plugin package to the desktop app. You will add the sharing prefab from the HoloToolkit to both projects. In the app intended for the HoloLens, add the IP address of your machine to the Server Address field in the Sharing Stage component.

The two apps are largely identical. On the PC side, the app takes the body stream from the Kinect and sends the joint data to a script named BodyView.cs. BodyView creates spheres for each joint when it recognizes a new body and then repositions these joints whenever it gets updated Kinect.

[code lang=”csharp”]

private GameObject CreateBodyObject(ulong id)
{
GameObject body = new GameObject("Body:" + id);
for (int i = 0; i < 25; i++)
{
GameObject jointObj = GameObject.CreatePrimitive(PrimitiveType.Sphere);

jointObj.transform.localScale = new Vector3(0.3f, 0.3f, 0.3f);
jointObj.name = i.ToString();
jointObj.transform.parent = body.transform;
}
return body;
}

private void RefreshBodyObject(Vector3[] jointPositions, GameObject bodyObj)
{
for (int i = 0; i < 25; i++)
{
Vector3 jointPos = jointPositions[i];

Transform jointObj = bodyObj.transform.FindChild(i.ToString());
jointObj.localPosition = jointPos;
}
}

[/code]

As this is happening, another script called BodySender.cs intercepts this data and sends it to the sharing service. On the HoloLens device, a script named BodyReceiver.cs gets this intercepted joint data and passes it to its own instance of the BodyView class that animates the dot man made up of sphere primitives.

The code used to adapt the sharing service for transmitting Kinect data is contained in Ma’s CustomMessages2 class, which is really just a straight copy of the CustomMessages class from the HoloToolkit sharing example with a small modification that allows joint data to be sent and received:

[code lang=”csharp”]

public void SendBodyData(ulong trackingID, Vector3[] bodyData)
{
// If we are connected to a session, broadcast our info
if (this.serverConnection != null && this.serverConnection.IsConnected())
{
// Create an outgoing network message to contain all the info we want to send
NetworkOutMessage msg = CreateMessage((byte)TestMessageID.BodyData);

msg.Write(trackingID);

foreach (Vector3 jointPos in bodyData)
{
AppendVector3(msg, jointPos);
}

// Send the message as a broadcast
this.serverConnection.Broadcast(
msg,
MessagePriority.Immediate,
MessageReliability.UnreliableSequenced,
MessageChannel.Avatar);
}
}

[/code]

Moreover, once you understand how CustomMessages2 works, you can pretty much use it to send any kind of data you want.

Be one with The Force

Another thing the Kinect is very good at is gesture recognition. HoloLens currently supports a limited number of gestures and is constrained by what the inside-out cameras can see – mostly just your hands and fingers. You can use the Kinect-HoloLens integration above, however, to extend the HoloLens’ repertoire of gestures to include the user’s whole body.

For example, you can recognize when a user raises her hand above her head simply by comparing the relative positions of these two joints. Because this pose recognition only requires the joint data already transmitted by the sharing service and doesn’t need any additional Kinect data, it can be implemented completely on the receiver app running in the HoloLens.

[code lang=”csharp”]

private void DetectGesture(GameObject bodyObj)
{
string HEAD = "3";
string RIGHT_HAND = "11";

// detect gesture involving the right hand and the head
var head = bodyObj.transform.FindChild(HEAD);
var rightHand = bodyObj.transform.FindChild(RIGHT_HAND);

// if right hand is half a meter above head, do something
if (rightHand.position.y > head.position.y + .5)
_gestureCompleteObject.SetActive(true);
else
_gestureCompleteObject.SetActive(false);
}

[/code]

In this sample, a hidden item is shown whenever the pose is detected. It is then hidden again whenever the user lowers her right arm.

The Kinect v2 has a rich literature on building custom gestures and even provides a tool for recording and testing gestures called the Visual Gesture Builder that you can use to create unique HoloLens experiences. Keep in mind that while many gesture solutions can be run directly in the HoloLens, in some cases, you may need to run your gesture detection routines on your desktop and then notify your HoloLens app of special gestures through a further modified CustomMessages2 script.

As fun as dot man is to play with, he isn’t really that attractive. If you are using the Kinect for gesture recognition, you can simply hide him by commenting a lot of the code in BodyView. Another way to go, though, is to use your Kinect data to animate a 3D character in the HoloLens. This is commonly known as avateering.

Unfortunately, you cannot use joint positions for avateering. The relative sizes of a human being’s limbs are often not going to be the same as those on your 3D model, especially if you are trying to animate models of fantastic creatures rather than just humans, so the relative joint positions will not work out. Instead, you need to use the rotation data of each joint. Rotation data, in the Kinect, is represented by an odd mathematical entity known as a quaternion.

Quaternions

Quaternions are to 3D programming what midichlorians are to the Star Wars universe: They are essential, they are poorly understood, and when someone tries to explain what they are, it just makes everyone else unhappy.

The Unity IDE doesn’t actually use quaternions. Instead it uses rotations around the X, Y and Z axes (pitch, yaw and roll) when you manipulate objects in the Scene Viewer. These are also known as Euler angles.

There are a few problems with this, however. Using the IDE, if I try to rotate the arm of my character using the yellow drag line, it will actually rotate both the green axis and the red axis along with it. Somewhat more alarming, as I try to rotate along just one axis, the Inspector windows show that my rotation around the Z axis is also affecting the rotation around the X and Y axes. The rotation angles are actually interlocked in such a way that even the order in which you make changes to the X, Y and Z rotation angles will affect the final orientation of the object you are rotating. Another interesting feature of Euler angles is that they can sometimes end up in a state known as gimbal locking.

These are some of the reasons that avateering is done using quaternions rather than Euler angles. To better visualize how the Kinect uses quaternions, you can replace dot man’s sphere primitives with arrow models (there are lots you can find in the asset store). Then, grab the orientation for each joint, convert it to a quaternion type (quaternions have four fields rather than the three in Euler angles) and apply it to the rotation property of each arrow.

[code lang=”csharp”]

private static Quaternion GetQuaternionFromJointOrientation(Kinect.JointOrientation jointOrientation)
{
return new Quaternion(jointOrientation.Orientation.X, jointOrientation.Orientation.Y, jointOrientation.Orientation.Z, jointOrientation.Orientation.W);
}
private void RefreshBodyObject(Vector3[] jointPositions, Quaternion[] quaternions, GameObject bodyObj)
{
for (int i = 0; i < 25; i++)
{
Vector3 jointPos = jointPositions[i];

Transform jointObj = bodyObj.transform.FindChild(i.ToString());
jointObj.localPosition = jointPos;
jointObj.rotation = quaternions[i];
}
}

[/code]

These small changes result in the arrow man below who will actually rotate and bend his arms as you do.

For avateering, you basically do the same thing, except that instead of mapping identical arrows to each rotation, you need to map specific body parts to these joint rotations. This post is using the male model from Vitruvius avateering tools, but you are welcome to use any properly rigged character.

Once the character limbs are mapped to joints, they can be updated in pretty much the same way arrow man was. You need to iterate through the joints, find the mapped GameObject, and apply the correct rotation.

[code lang=”csharp”]

private Dictionary<int, string> RigMap = new Dictionary<int, string>()
{
{0, "SpineBase"},
{1, "SpineBase/SpineMid"},
{2, "SpineBase/SpineMid/Bone001/Bone002"},
// etc …
{22, "SpineBase/SpineMid/Bone001/ShoulderRight/ElbowRight/WristRight/ThumbRight"},
{23, "SpineBase/SpineMid/Bone001/ShoulderLeft/ElbowLeft/WristLeft/HandLeft/HandTipLeft"},
{24, "SpineBase/SpineMid/Bone001/ShoulderLeft/ElbowLeft/WristLeft/ThumbLeft"}
};

private void RefreshModel(Quaternion[] rotations)
{
for (int i = 0; i < 25; i++)
{
if (RigMap.ContainsKey(i))
{
Transform rigItem = _model.transform.FindChild(RigMap[i]);
rigItem.rotation = rotations[i];
}
}
}

[/code]

This is a fairly simplified example, and depending on your character rigging, you may need to apply additional transforms on each joint to get them to the expected positions. Also, if you need really professional results, you might want to look into using inverse kinematics for your avateering solution.

If you want to play with working code, you can clone Wavelength’s Project-Infrared repository on github; it provides a complete avateering sample using the HoloToolkit sharing service. If it looks familiar to you, this is because it happens to be based on Michelle Ma’s HoloLens-Kinect code.

Looking at point cloud data

To get even closer to the Princess Leia hologram message, we can use the Kinect sensor to send point cloud data. Point clouds are a way to represent depth information collected by the Kinect. Following the pattern established in the previous examples, you will need a way to turn Kinect depth data into a point cloud on the desktop app. After that, you will use shared services to send this data to the HoloLens. Finally, on the HoloLens, the data needs to be reformed as a 3D point cloud hologram.

The point cloud example above comes from the Brekel Pro Point Cloud v2 tool, which allows you to read, record and modify point clouds with your Kinect.

The tool also includes a Unity package that replays point clouds, like the one above, in a Unity for Windows app. The final steps of transferring point cloud data over the HoloToolkit sharing server to HoloLens is an exercise that will be left to the reader.

If you are interested in a custom server solution, however, you can give the open source LiveScan 3D – HoloLens project a try.

HoloLens shared experiences and beyond

There are actually a lot of ways to orchestrate communication for the HoloLens of which, so far, we’ve mainly discussed just one. A custom socket solution may be better if you want to institute direct HoloLens-to-HoloLens communication without having to go through a PC-based broker like the sharing service.

Yet another option is to use a framework like WebRTC for your communication layer. This has the advantage of being an open specification, so there are implementations for a wide variety of platforms such as Android and iOS. It is also a communication platform that is used, in particular, for video chat applications, potentially giving you a way to create video conferencing apps not only between multiple HoloLenses, but also between a HoloLens and mobile devices.

In other words, all the tools for doing HoloLens telepresence are out there, including examples of various ways to implement it. It’s now just a matter of waiting for someone to create a great solution.