July 24, 2013

Proximity in Windows Phone 8

At the recent Build developer conference in San Francisco I presented a session on networking in Windows Phone 8. Part of the session was about HTTP networking using the new HttpClient portable library, but one of the more interesting parts of the session was about Proximity and how you can use it in your app to enable cool scenarios such as multiplayer gaming or seamless content transfer. In this blog post I cover much of the same ground as in the talk, which you also can watch online. You can download the accompanying source code from MSDN: Proximity APIs Sample for Windows Phone 8.

Although this blog post and the accompanying sample code are specific to Windows Phone, Windows 8 shares the Proximity APIs, and the core code works on both platforms.

Proximity connection mechanisms

Windows Phone Proximity APIs give you a high-level abstraction over lower-level hardware features such as Bluetooth, Near Field Communication (NFC), and Wi-Fi. Your app simply asks the platform to connect to a nearby device and it will provide you with a connected socket in return. The operating system does all the hard work of choosing which technology to use and setting up the low-level connection.

You can initiate proximity-based communications in two ways: by browsing for peers, or by using a tap gesture. Sometimes only one of the scenarios will make sense for your app, but in other cases using both approaches might be the most useful. Here I discuss proximity in the context of writing a game, but the concepts apply equally well to non-game apps such as a photo sharing app.

Browsing for peers

This scenario will be familiar to anyone who’s played multiplayer games like Halo 4 that have a “lobby” experience for game player matchmaking. Many players start the game and choose the multiplayer option, which initiates a match-making process to find suitable opponents. In some cases like Halo 4, the game automatically picks the best set of players for each match. In other cases like casual games, the user might be presented with a set of possible opponents and they get to decide which one to play against.

With Proximity in Windows Phone, the user can browse for peers in the same way, except that all other potential players must be in close proximity to – or “nearby” – the user to be discoverable (the exact definition of “nearby” depends on Bluetooth range for phones and Wi-Fi topology for PCs). This scenario works when you don’t particularly care who you will be playing against; you just need someone else who is using the app at the same time and is also looking for a match.

Browsing for peers uses the FindAllPeersAsync method, the ConnectAsync method, and the ConnectionRequested event of the PeerFinder class to locate peers, connect to peers, and respond to connection requests.

Tap gesture

This scenario is a much more deliberate method of connecting, more like sending a game invite directly to a friend rather than waiting in the general multiplayer lobby. Because the user’s tap gesture is very deliberate and unlikely to occur accidentally, the app knows exactly who the intended recipient is so there’s no need for matchmaking algorithms or prompts when the user initiates a connection. This scenario works well if you really do care who you’re communicating with (especially if it might be sensitive information). It has the added benefit that the other party doesn’t need to be actively running the app.

Tapping uses the TriggeredConnectionChanged event for both the initiator and the recipient of the tap gesture. Behind the scenes, magic is happening to produce a connected socket.

In pictures

The following diagram (lifted straight from the Build presentation) represents the two approaches a “host” can use to start a game with other devices. For the purposes of this post, the “host” is the device that is initiating the proximity-based scenario (either by browsing or by tapping):

This diagram represents the two approaches that the “client” can use to connect to a host. For the purposes of this post, the “client” is the device that reacts to the proximity-based scenario (either by waiting for the host or by accepting the tap):

In some cases, you might want every device to act as both a host and a client and to provide your own algorithm for negotiating how the nodes connect to each other. In the sample game provided with this post, there is always one host and one client.

ProximityDevice Class

Neither of these scenarios uses the ProximityDevice class, which you use for publishing and receiving stand-alone messages rather than for setting up ongoing conversations. One typical use of ProximityDevice (which I don’t cover here) is an app that can read NFC tags embedded in a poster, for example.

Sample app

The sample code implements a simple question-and-answer game. Basically it works like this:

Player A starts the game by opening the app, typing a display name, and then pressing start game.
Either Player A taps their phone with Player B’s phone (initiating a tap-and-play scenario), or Player B starts the app, types in their own display name, and presses look for game.
1. In the “look for game” scenario, Player A sees Player B on their phone as a possible peer, and explicitly connects their phone to Player B’s by pressing go. Player B must then do the same.
2. If multiple people nearby have all pressed look for game in the same timeframe, they all appear as possible opponents on Player A’s phone. Player A can choose which person to play against.
3. For tap-and-play to work, Player B’s phone must be on and unlocked (NFC is not active if the phone is off or is locked), but they don’t need to be actively running the app.
After the two players connect, Player A types a question, and then presses send.
Player B types an answer to the question, and then presses send.
Player A responds by pressing either correct 🙂 or incorrect 🙁.
The game cycles back to the screen in step 3.

There’s no scoring in this sample app, and no way for the players to switch roles – it’s a simple game that illustrates the basic concepts of using Proximity in Windows Phone.

High-level architecture

The basic app looks like this:

MainPage references three View Models: Lobby, Asker, and Answerer.

The Lobby View Model interacts with PeerToPeerManager, which is the main focus of this post, and it provides some high-level wrappers over the core Proximity APIs. After Lobby has successfully extracted a socket from PeerToPeerManager, the socket is passed to RemotePlayer, which exposes a high-level API to communicate with another player.

Asker and Answerer interact only with RemotePlayer and do not have any knowledge of the underlying Sockets or Proximity APIs.

PeerToPeerManager

PeerToPeerManager is a relatively thin wrapper around the PeerFinder APIs. It supports lobby-style scenarios and tap-to-play scenarios. The public surface area looks like this:

Here’s a brief description of each member and what it does.

ConnectToPeerAsync connects to a peer that is discovered and reported by the CandidatePeersAvailable event as a result of a browsing scenario; if successful, the method returns the connected socket.
GetDefault is the standard WinRT way to implement the singleton pattern; you call it to get the single instance of the PeerToPeerManager type (you can’t create a new instance).
ResetAfterError tries to recover after the StateChanged event has reported an error. This isn’t particularly robust but is good enough for recovering from common peer browsing errors.
StartClient / StopClient start and stop acting as a client, that is, listening for connection requests and accepting tap gestures.
StartHost / StopHost start and stop acting as a host, that is, browsing for peers and advertising the tap gesture.
DisplayName is the name that will be used to identify this device when displayed on other devices.
State / StateChanged provide the current state of the manager, for example, “connecting”.
CandidatePeersAvailable is raised in host scenarios whenever peers have been located via browsing, or in client scenarios whenever a host has connected; if the peer is the one that you want to connect to, you pass it to ConnectToPeerAsync.
TapAndPlayPeerFound / TapAndPlayPeerConnected are raised in both client and host scenarios whenever a tap-and-play gesture is in progress. The Connected event provides the connected socket. The Found event is mostly useful for updating UI.

The main benefit of using this wrapper instead of the raw PeerFinder APIs is that you can think in simple terms of “I’m the host” or “I’m the client” and not have to worry about what TriggeredConnectionStateChanged means and whether or not you need to call LookForPeers or wait for the ConnectionRequested event (or both!). And because it hides all of the implementation details of the Proximity APIs, you could extend the PeerToPeerManager class to add other matchmaking solutions, such as a server-based game lobby, without changing any client code whatsoever. You could also extend it to use TCP/IP sockets inside the emulator to facilitate easier testing using a companion desktop app.

DataReader and DataWriter extensions

Another interesting part of the sample is the MiscExtensions.cs file that contains some extension methods for DataReader and DataWriter. The code itself is a little hard to follow because it centralizes some error handling (and supports a gratuitous set of “receive” and “transmit” lights in the UI) but its primary purpose is to make it possible to correctly read and write strings. Although DataReader and DataWriter already have methods to read and write strings, they might not work the way you’d expect.

Imagine you want to send (and receive) the string “Pizza” over a socket, and for the sake of discussion assume the data is encoded in UTF-8. You might assume that because “Pizza” has five characters, and UTF-8 encodes each character as a single byte, you only need to write 5 bytes to the stream and then read 5 bytes out on the other end.

So you would write the following code:

var text = "Pizza";

writer.WriteByte((byte)text.Length); // send byte count

writer.WriteString(text);

await writer.StoreAsync();

And that would send the following bytes down the wire:

5 , ‘P’ , ‘i’ , ‘z’ , ‘z’ , ‘a’.

On the receiving end, you’d do something like this:

await reader.LoadAsync(1);  // read byte count

var stringLength = reader.ReadByte();

await reader.LoadAsync(stringLength);

var text = reader.ReadString(stringLength);

And it would work! You’d get the string “Pizza” correctly across the wire. But what if you want to send the string “Pizza🍕”? That is, the word “Pizza” followed by the Unicode character SLICE OF PIZZA (yes it’s part of the standard)? In case your browser can’t show that character, it looks like this on Windows Phone:

If you try running the same code, it won’t work:

The hint here is the “No mapping for the Unicode character exists” text – something has gone wrong with reading the string back. If you inspect the bytes sent over the wire, you’ll see this:

7 , ‘P’ , ‘i’ , ‘z’ , ‘z’ , ‘a’ , 240 , 159 , 141 , 149

Something unexpected is that even though the string looks like it has six characters, the length is actually reported as being seven characters. Stranger still, there are actually 9 bytes in the transmitted string! That’s because the SLICE OF PIZZA character actually takes 4 bytes to encode, and represents 2 Char objects (16-bit values) in the string.

Your next step might be to use the MeasureString method to calculate the number of bytes required to correctly encode a string. You might revise the code this way:

var text = "Pizza🍕";

var byteCount = writer.MeasureString(text);

writer.WriteByte((byte)byteCount);

writer.WriteString(text);

await writer.StoreAsync();

and

await reader.LoadAsync(1); // read byte count

var byteCount = reader.ReadByte();

await reader.LoadAsync(byteCount);

var text = reader.ReadString(byteCount);

Now you can happily send and receive strings that contain arbitrary Unicode characters and they will work perfectly.

But what if you switch from UTF-8 to UTF-16? (Hint: The code will break again.) It turns out that MeasureString probably doesn’t work the way you might expect, either. It doesn’t return the length of the string in characters, nor does it return the size of the string in bytes. Instead, it returns the number of units required to hold the string, where the unit is either 1 byte for UTF-8 or 2 bytes for UTF-16, as determined by the UnicodeEncoding property. So our string “Pizza🍕” needs 9 units (and thus 9 bytes) in UTF-8, but only 7 units (or 14 bytes) in UTF-16. On the receiving end, LoadAsync requires the number of bytes to load, and ReadString requires the number of units to read. Confused yet? 🙂

To solve this, we end up sending both values down the wire. We could just compute the number of units on the receiving end, but I personally like the explicitness of sending them both and the simplicity of not having to worrying about it. This is what it looks like:

var text = "Pizza🍕";

var unitCount = writer.MeasureString(text);

uint byteCount = unitCount;

// Double the number of bytes for UTF16

if (writer.UnicodeEncoding != UnicodeEncoding.Utf8)

  byteCount = unitCount * 2;

writer.WriteUInt32(sizeof(UInt32) + byteCount); // Total bytes to read

writer.WriteUInt32(unitCount); // Length in units

writer.WriteString(text); // The string

and

await reader.LoadAsync(sizeof(UInt32));

expectedBytes = reader.ReadUInt32();

await reader.LoadAsync(expectedBytes);

var unitCount = reader.ReadUInt32();

string result = reader.ReadString(unitCount);

The good news is that the extensions to DataReader and DataWriter handle the details of creating the length-prefixed strings, regardless of encoding. They also encode and verify the expected length of the decoded string (yet another value), which can be useful for debugging, and might help if you’re dealing with legacy code that doesn’t correctly handle Unicode encodings. For example, the string “Pizza🍕” and the string “Pizza????” are both nine bytes (units) long in UTF-8, but the former has a length of seven characters, and the latter a length of nine characters. If something happens along the way to break the UTF-8 encoding, the code will detect it and throw an exception.

Remainder of the sample code

The rest of the code is pretty straightforward, generally following an MVVM-style design (although I’m sure some purists might disagree with that statement). There is a lot of logging code throughout the app; this really helps during development and testing, because unless you are set up to debug two phones at the same time (on two different PCs, with the source code in sync) typically you’ll be flying blind on one of the phones. In addition to dumping the log info in the Output window in Visual Studio, the logs are sent to a hidden ListBox on the main page.

Double-tapping the “q+a game” heading shows or hides the log, which can help you diagnose unexpected issues. The logging code won’t be compiled into a Release build of the app because it uses the Conditional attribute. I’ve used minimal commenting in the code, mostly because the logging acts as a good running commentary.

Finally, although I haven’t tested this sample code on Windows 8.1 or Windows 8, the core code should work with minimal changes as I mention in the introduction. You will need to update the UI, of course, and there are some Windows Phone-specific dependencies sprinkled throughout (such as Dispatcher.BeginInvoke), but you should be able to get the basics up and running pretty easily. If you want to port the code to Windows, Matt Hidinger’s presentation on sharing code across Windows and Windows Phone
will come in handy.

Tags:

Networking

NFC

Proximity

Windows Phone samples