May 2, 2018 10:05 am

Bringing Screen Capture to Microsoft Edge with the Media Capture API

Beginning with the EdgeHTML 17, Microsoft Edge is the first browser to support Screen Capture via the Screen Capture API. Web developers can start building on this feature today by upgrading to the Windows 10 April 2018 Update, or using one of our free virtual machines.

Screen Capture uses the new getDisplayMedia API specified by the W3C Web Real-Time Communications Working Group The feature lets web pages capture output of a user’s display device, commonly used to broadcast a desktop for plugin-free virtual meetings or presentations. Using Media Capture, Microsoft Edge can capture all Windows applications–including including Win32 and Universal Windows Platform applications (UWP apps).

In this post, we’ll walk through how Screen Capture is implemented in Microsoft Edge, and what’s on our roadmap for future releases, as well as some best practices for developers looking to get started with this API today.

Getting started with the Screen Capture API

The getDisplayMedia() method is the heart of the Screen Capture API. The getDisplayMedia() call takes MediaStreamConstraints as an optional input argument.  Once the user grants permission, the getDisplayMedia() call will return a promise with a MediaStream object representing the user-selected capture device.

The MediaStream object will only have a MediaStreamTrack for the captured video stream; there is no MediaStreamTrack corresponding to a captured audio stream. The MediaStream object can be rendered on multiple rendering targets, for example, by setting it on the srcObject attribute of MediaElement (e.g. video tags).

While the operation of the getDisplayMedia API is superficially very similar to getUserMedia, there are some important differences. To ensure users are in control of any sensitive information which may be captured, getDisplayMedia does not allow the MediaStreamConstraints argument to influence the selection of sources. This is different from getUserMedia, which enables picking a specific capture device.

Our implementation of Screen Capture currently does not support the use of MediaStreamConstraints to influence MediaStreamTrack characteristics (such as framerate or resolution). The getSettings() method can’t be used to obtain the type of display surface that was captured, although information such as the width, height, aspect ratio and framerate of the capture can be obtained. Within the W3C Web Real-Time Communications Working Group there is ongoing discussion of how MediaStreamConstraints influences properties of the captured screen device, such as resolution and framerate, but consensus has not yet been reached.

User permissions

While screen capture functionality can enable a lot of exciting user and business scenarios, removing the need for additional third-party software, plugins, or manual user steps for scenarios such as conference calls and desktop screenshots, it also introduces security and privacy concerns. Explicit, opt-in user consent is a critical part of the feature.

While the W3C specification recommends some best practices, it also leaves each browser some flexibility in implementation. To balance security and privacy concerns and user experiences, our implementation requires the following:

  • An HTTPS origin is required for getDisplayMedia() to be called.
  • The user is prompted to allow or deny permission to allow screen capture when getDisplayMedia() is called.
  • While the user’s chosen permissions persist, the capture picker UI will come up for each getDisplayMedia() call. Permissions can be managed via the site permissions UI in Microsoft Edge (in Settings or via the site info panel in the URL bar).
  • If a webpage calls getDisplayMedia() from an iframe, we will manage the screen capture device permission separately based on its own URL. This provides protection to the user in cases where the iframe is from a different domain than its parent webpage.
  • As noted above, we do not permit MediaStreamConstraints to influence the selection of getDisplayMedia screen capture sources.

Sample scenarios using screen capture

Screen capture is an essential step in many scenarios, including real-time audio and video communications. Below we walk through a simple scenario introducing you to how to use the Screen Capture functionality.

Capture photo from a screen capture device

Let’s assume we have a video tag on the page and it is set to autoplay.  Prior to calling navigator.getDisplayMedia, we set up constraints and create a handleSuccess function to wire the screen capture stream to the video tag as well as a handleError function to log an error to the console if one occurs.

When navigator.getDisplayMedia is called, the picker UI comes up and the user can select whether to share a window or a display.

Image showing the picker UI for Screen Capture in Microsoft Edge

The Picker UI allows the user to select whether to share the entire display, or a particular window.

While being captured, the chosen application or display will have a yellow border draw around it which is not included in the capture frame. Application windows being captured will return black frames while minimized (though they will still be enumerated in the picker); if the window is restored, rendering will resume.

If an application window includes a privacy flag (setDisplayAffinity or isScreenCaptureEnabled) the application is not enumerated in the picker. Application windows being captured will not include overlapping content, which is an improvement on snapshotting the entire display and cropping to window location.

What’s next for Screen Capture

Currently the MediaStream produced by getDisplayMedia can be consumed by the ORTC API in Microsoft Edge.  To optimize encoding in screen capture scenarios, the  degradationPreference encoding parameter is used.  For applications where video motion is limited (e.g. a slideshow presentation), degradationPreference should be set to “maintain-resolution” for best results. To limit the maximum framerate that can be sent over the wire, the maxFramerate encoding parameter can be used.

To use the MediaStream with the WebRTC 1.0 API in Microsoft Edge, we recommend the adapter.js library, as we work towards support for getDisplayMedia along with the WebRTC 1.0 object model in a future release.

You can get started with the Screen Capture API in Microsoft Edge today on EdgeHTML 17.17134 or higher, available in the Windows 10 April 2018 Update or through the free virtual machines on the Microsoft Edge Developer Site. Try it out and let us know what you think by reaching out to @MSEdgeDev on Twitter or submitting feedback at https://issues.microsoftedge.com!

– Angelina Gambo, Senior Program Manager, Microsoft Edge
– Bernard Aboba, Principal Architect, Skype

Join the conversation

  1. Main problem with the capture API in Windows (and it exist not only when capturing from Edge) is that when sharing whole desktop the taskbar is overlaid with highlight frame, and the frame hides the underlining of active app icons. So when I’m presenting my screen, I can’t guess which of my apps are launched. For a person who does this for a first time it may look like with presentation starting all apps are terminated, which can really confuse.

  2. It is nice to get Screen Capturing but we are also still waiting for DataChannels which make WebRTC experience complete in Edge