June 21, 2016

Mobile

FamilyNotes: Introducing a Windows UWP sample using ink, speech, and face recognition

Windows Apps Team

If you watched the Build 2016 keynote talks, you may have noticed how Microsoft is focusing more and more on improving how we humans interact with computers, by means of ink, speech, and other “more personal computing” features. Modern software is changing focus to using natural interfaces as much as possible, providing friendly cues and intelligent context-aware feedback.

To help you add similar capabilities to your own projects, you may want to check out a newly published Universal Windows Platform (UWP) sample app called “FamilyNotes.” It’s a notice board app designed to demonstrate modern features in a real-world scenario, with support for ink, speech, and some rather impressive behind-the-scene “smarts” using Microsoft Cognitive Services.

Today, we are introducing a three-post series revolving around FamilyNotes. We’ll cover what FamilyNotes does, how it works, and the challenges we faced while writing it. Here is our map to the three posts ahead, including this one:

Introduction, features, and the app development story
Face detection and recognition
Ink and speech

Let’s get started.

https://channel9.msdn.com/Blogs/One-Dev-Minute/Using-Ink-Voice-and-Face-Recognition-in-a-UWP-App/

Important note regarding privacy

When the facial recognition feature is activated, this app detects faces and submits them to the Microsoft Cognitive Services platform. When testing, you need to make sure everyone who uses the app is OK with Microsoft potentially keeping copies of the images transmitted. If you publish an app that uses this feature, you must include a privacy statement that explains that this is happening. Microsoft has a privacy statement to which you should also refer.

App features

FamilyNotes has a variety of features:

Users can create and present “sticky notes” for everyone in the family.
Users can add new notes by typing, inking, or dictating them.
Users can utilize Cortana and voice commands to activate the app and start a new note to the entire family or a specific family member.
Users can utilize face detection and face recognition to automatically display the notes applicable to a specific user.
App is written in C# and XAML, following a Model-View-Controller methodology.
Source code is generously documented.

Here’s what you’ll need to compile and run the app:

The GitHub sample, which you can download by clicking on the “Download Zip” button and unarchiving it.
A computer running Windows 10, preferably with ink and touch support, with a camera and microphone. External webcams should work fine if your computer doesn’t have one built-in.
Visual Studio 2015, Community Edition or better, with Universal Windows App Development Tools.
A subscription key for the Microsoft Face API. For information about getting a free trial key, see the Microsoft Cognitive Services site.

Downloading, building, and running FamilyNotes

The default project is called FamilyNotes. Once you’ve opened it with Visual Studio, you can Start Debugging (F5) or Start Without Debugging (Ctrl+F5) to try it out. The app will run in the emulator, but using the “local device” is better to help ensure the camera and microphone support works as expected. There is no support for a phone layout at this time.

The app will launch with a default family group (“Everyone”). Tap the New Person button in the command bar and add some new family members. If you want to test the face detection and recognition features—assuming you have a camera—you can take a snapshot, too.

Now you can tap the yellow New note button and the list of family members will appear. Tap a family member to create a note, and then add the content by typing with the keyboard or scribbling with a pen.

Talk to me

FamilyNotes makes extensive use of speech in multiple ways. After the app has been launched at least once, Cortana will begin to respond to voice commands. When the app is not running, you can say “Hey Cortana, FamilyNotes” to activate it, or “Hey Cortana, FamilyNotes add new note for everyone” to activate it and add a new note.

When the app is running, you can use phrases such as “Add new note for <person name>” to start a new note, or “Read note” to have it read back to you.

You can also use speech recognition to dictate what your note should say, by tapping the Dictate button. Dictation will be enabled automatically if you create the note with Cortana.

Are you looking at me?

By tapping the contacts listed on the left, you can filter the notes to highlight only the notes tagged for that person. And here’s the cool part: if your PC has a camera, this can happen automatically using face recognition.

To enable this feature, you will need to sign up for a Microsoft Cognitive Services account (a free demo account is available) to receive your key. Once you have the key, enter it in the app’s Settings view. If you have created a user using the “Snapshot” option to grab a picture, the app will now detect the user’s face and filter the notes to the recognized user.

Note: The Microsoft Face API subscription key must be entered in the Settings menu of the app before facial recognition can be used. The settings menu is opened by clicking the gear button on the apps command bar. Please read the terms and conditions for using Microsoft Cognitive Services before using it.

A brief history of the app’s development

We designed our notice board app to demonstrate the many MPC (our acronym for “More Personal Computing”) features of Windows 10. As with all good projects, we started with several whiteboards full of drawings: at one end the proposed user interface, and at the other, the proposed data model. Eventually, we met in the middle.

When planning an app, everyone is flush with excitement and keen to include as many features as possible.

“Let’s allow notes to be assigned to multiple people!”
“Let’s allow voice mail!”
“Let’s add video recording!”
“Let’s sync notes to phones over Azure!”

Rather than try to include everything and kitchen sink for version 1, we decided to focus on implementing the technologies in a way that would be easily understood, rather than create a hit app on the Windows Store and all retire to the Caribbean. When you think about it, this “minimal usable product” is also good advice for shipping version 1 on any product.

We spun up a git repo to track our work and split into teams. In separate branches, each team prototyped ink controls, camera controls, and voice to make sure we could get core features working.

This prototyping phase helped us get a clear idea of what features we would want to implement, how they might work, and roughly how the app might look. In an ideal world, we would have engaged dedicated designers at this point, but with typical Programmer Hubris, we went ahead with our own ideas. When designing an app, you might be less egotistical than we were and consider getting actual designers at this point and ask them to work with you in Blend or Photoshop.

We soon reached the stage when we needed to throw a lot of prototype code away and get a proper XAML-based, model-view-controller architecture in place. The core of our app was the data model. We needed a collection of family members, and we needed a collection of sticky notes. We kept both lists stored in ObservableCollections so that we could use binding to automatically trigger the actions in the XAML that comprised the user interface. For example, adding a person object to the family collection would lead to the XAML for a “person button” appearing in the column on the left of the screen.

Once the data model was set, we decided to freeze the addition of new features. Well, almost. We realized that having the app speak the content notes was an important and cool feature and wouldn’t require drastic code changes, so we decided to implement that, too.

Then, we realized another low impact addition was the ability to display the Bing Image of the Day as the backdrop. This was also a good incentive to improve the appearance of the XAML Notes. This did mean that we needed to add a Settings panel, but then we definitely, definitely decided no more features. Really. We meant it this time. No. More. Features.

7_notes — *More UI refinements, including ditching the audio and photo options.*

The last 10% takes 90% of the time…

As is always the way with software development, toward the end of the project it seemed like we were always just one fix away from the last bug—and then another one would appear. Sometimes, these bugs were edge cases that were only noticeable in certain, rare situations. Sometimes, they manifested only when we demonstrated the app to management.

My favorite bug happened when we were testing the dictation mode. Every time we started dictating a note, it was preceded with the string “dictate your note.” We thought we had left some debugging messages in our code, but we couldn’t find it in the source code. And it only happened on some computers, not all. It took a little while to realize that the computer was hearing itself say, “Please start dictating your note,” recognizing its own speech as a human voice and adding it to the note. The testers who had their volume turned down didn’t see the string appear because the computer couldn’t hear itself!

When your computer starts talking to itself, you know the software is getting smart.

Wrapping up

In our next post, we’ll get into the code and look at the work that went into the face detection and recognition features. We’ll also explore how you might add these features to your own apps. In the meantime, feel free to download the project and try it out. We’ve been working on updates and improvements to the app, so if you happen to have already downloaded it, go check it out again.

Feel free to ask any questions, either here on the blog or in the GitHub repo. Your feedback is critical!

Additional information

Written by John Kennedy, Senior Content Publisher Manager for Windows & Devices Group

Tags:

Microsoft Cognitive Services

speech