Introducing the Speech Synthesis API in Microsoft Edge

Jerry Smith

Starting with the Windows 10 Anniversary Update, Microsoft Edge will support the Speech Synthesis APIs defined in the W3C Web Speech API Specification. These APIs allow websites to convert text to audible speech with customizable voice and language settings. With them, website developers can add and control text-to-speech features specific to their page content and design.

Speech Synthesis is useful whenever narration might be applied. Our implementation also supports Speech Synthesis Markup Language (SSML) Version 1.0 to provide further control over the speech output.

Speech Synthesis is enabled by default in Windows Insider Preview builds starting with EdgeHTML 14.14316 and above – try it out with our new Speech Synthesis Demo on Test Drive!

API Overview

The Web Speech API Specification defines a SpeechSynthesisUtterance interface that lets Javascript set speech text along with attributes that control the voice used and modify the language pronunciation, voice, volume, rate and pitch of the voice output. Other interfaces are defined that allow playback control and monitoring state of the synthesized speech.

Microsoft Edge implements these SpeechSynthesis interfaces:

SpeechSynthesis: Provides speech playback control and state
SpeechSynthesisUtterance: Controls speech content, voice and pronunciation
SpeechSynthesisEvent: Provides state information on the current utterance
SpeechSynthesisVoice: Sets speech service information

Our implementation of these Speech Synthesis APIs is based on the WinRT Windows.Media.SpeechSynthesis APIs. These directly support most of the W3C Speech Synthesis interfaces. There are a few SpeechSynthesis details that we don’t currently support this release, which we’re evaluating for future releases:

Playback pitch: Used to vary the voice pitch on playback.
onmark event: Used to indicate that a marked tag has been reached.
onboundary event: Used to signal boundaries of spoken words or sentences.

Speech Synthesis Demo

To illustrate these new speech features, we’ve published a Speech Synthesis Demo on Test Drive. This allows input of random text (try something really long) and exposes parameters like voice, language, rate and volume that allow tuning of the resulting speech.

The demo includes this sample code that uses SpeechSynthesisUtterance to take your selected text and speech settings, and use them to do a text to speech voice synthesis.

	function speak(textToSpeech) {
	var synUtterance = new SpeechSynthesisUtterance(textToSpeech);
	if (voiceSelect.value) {
	synUtterance.voice = speechSynthesis.getVoices().filter(function (voice) { return voice.name == voiceSelect.value; })[0];
	}
	synUtterance.lang = langSelect.value;
	synUtterance.volume = parseFloat(volumeRange.value);
	synUtterance.rate = parseFloat(rateRange.value);
	synUtterance.pitch = parseFloat(pitchRange.value);

	const eventList = ["start", "end", "mark", "pause", "resume", "error", "boundary"];
	eventList.forEach((event) => {
	synUtterance.addEventListener(event, (speechSynthesisEvent) => {
	log(`Fired '${speechSynthesisEvent.type}' event at time '${speechSynthesisEvent.elapsedTime}' and character '${speechSynthesisEvent.charIndex}'.`);
	});
	});
	window.speechSynthesis.speak(synUtterance);
	}

view raw SpeechSynthesisUtterance.js hosted with ❤ by GitHub

This sample reads in data from the demo HTML, and then uses window.speechSynthesis.speak to start playback. It shows how simple it is to add basic speech synthesis features to your website.

Speech Synthesis Markup Language (SSML)

SSML allows speech voices and content to be expressed in XML, allowing direct control over a variety of speech characteristics. You can try this by pasting SSML derived text into the Speech Demo.

Here’s an example of JavaScript SSML from the W3C spec:

	const ssml2 =
	'<?xml version="1.0"?>' +
	'<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" ' +
	'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ' +
	'xsi:schemaLocation="http://www.w3.org/2001/10/synthesis ' +
	'http://www.w3.org/TR/speech-synthesis/synthesis.xsd" ' +
	'xml:lang="en-US">' +
	'<voice gender="female">This is a sample of Speech Synthesis Markup Language.</voice>' +
	"<!-- now request a male voice -->" +
	'<voice gender="male">' +
	'It can be used with the Microsoft Edge Speech Synthesis feature.' +
	'</voice>' +
	'<!-- now request an alternate male voice -->' +
	'<voice gender="male" variant="2">It's a useful way for websites to customize their spoken text.</voice>' +
	'</speak>';

view raw SSML.js hosted with ❤ by GitHub

If we concatenate the SSML content, we get:

<?xml version="1.0"?><speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"><voice gender="female">This is a sample of Speech Synthesis Markup Language.</voice><!-- now request a male voice --><voice gender="male">It can be used with the Microsoft Edge Speech Synthesis feature.</voice><!-- now request an alternate male voice --><voice gender="male" variant="2">It's a useful way for websites to customize their spoken text.</voice></speak>

view raw SSML.xml hosted with ❤ by GitHub

Copy and paste this into the Speech Synthesis Demo text box to see how the voice selections affect the synthesize output.

Languages

The language setting in our Test Drive demo will work with any installed voice language pack in Windows 10. By default, there will be a primary language installed for a system. Others need to be installed. Here’s how to add an input language to your PC:

Go to Settings > Time & language > Region & language.
Select Add a language.
Select the language you want to use from the list, then choose which region’s version you want to use. Your download will begin immediately.

Once installed, the language pack will be used to alter the pronunciation of foreign language text.

We’re excited to share this release of HTML5 speech capabilities in Microsoft Edge. We prioritized Speech Synthesis based on feedback from users and developers, and we look forward to refining our speech support in the future with speech synthesis feature enhancements and speech recognition capabilities.

Try it out and let us know what you think!

– Steve Becker, Senior Software Engineer
– Jerry Smith, Senior Program Manager

Tags:

Accessibility

EdgeHTML

Narration

New Platform Features

Speech Synthesis

Web platform