Starting with the Windows 10 Anniversary Update, Microsoft Edge will support the Speech Synthesis APIs defined in the W3C Web Speech API Specification. These APIs allow websites to convert text to audible speech with customizable voice and language settings. With them, website developers can add and control text-to-speech features specific to their page content and design.
Speech Synthesis is useful whenever narration might be applied. Our implementation also supports Speech Synthesis Markup Language (SSML) Version 1.0 to provide further control over the speech output.
Microsoft Edge implements these SpeechSynthesis interfaces:
- SpeechSynthesis: Provides speech playback control and state
- SpeechSynthesisUtterance: Controls speech content, voice and pronunciation
- SpeechSynthesisEvent: Provides state information on the current utterance
- SpeechSynthesisVoice: Sets speech service information
Our implementation of these Speech Synthesis APIs is based on the WinRT Windows.Media.SpeechSynthesis APIs. These directly support most of the W3C Speech Synthesis interfaces. There are a few SpeechSynthesis details that we don’t currently support this release, which we’re evaluating for future releases:
- Playback pitch: Used to vary the voice pitch on playback.
- onmark event: Used to indicate that a marked tag has been reached.
- onboundary event: Used to signal boundaries of spoken words or sentences.
Speech Synthesis Demo
To illustrate these new speech features, we’ve published a Speech Synthesis Demo on Test Drive. This allows input of random text (try something really long) and exposes parameters like voice, language, rate and volume that allow tuning of the resulting speech.
The demo includes this sample code that uses SpeechSynthesisUtterance to take your selected text and speech settings, and use them to do a text to speech voice synthesis.
This sample reads in data from the demo HTML, and then uses window.speechSynthesis.speak to start playback. It shows how simple it is to add basic speech synthesis features to your website.
Speech Synthesis Markup Language (SSML)
SSML allows speech voices and content to be expressed in XML, allowing direct control over a variety of speech characteristics. You can try this by pasting SSML derived text into the Speech Demo.
If we concatenate the SSML content, we get:
Copy and paste this into the Speech Synthesis Demo text box to see how the voice selections affect the synthesize output.
The language setting in our Test Drive demo will work with any installed voice language pack in Windows 10. By default, there will be a primary language installed for a system. Others need to be installed. Here’s how to add an input language to your PC:
- Go to Settings > Time & language > Region & language.
- Select Add a language.
- Select the language you want to use from the list, then choose which region’s version you want to use. Your download will begin immediately.
Once installed, the language pack will be used to alter the pronunciation of foreign language text.
We’re excited to share this release of HTML5 speech capabilities in Microsoft Edge. We prioritized Speech Synthesis based on feedback from users and developers, and we look forward to refining our speech support in the future with speech synthesis feature enhancements and speech recognition capabilities.
Try it out and let us know what you think!
– Steve Becker, Senior Software Engineer
– Jerry Smith, Senior Program Manager