.net, Accessibility, C# tip, UWP

How to use C# and the Windows.Media.SpeechSynthesis library to make your UWP app talk

This is a short post, on the topic of building speech enabled UWP apps for the Windows Store.

The features available through the Universal Windows Platform are pretty interesting – and also pretty incredible, when you consider you get these APIs for free as long as you’re building an app. One of these features is speech synthesis.

I particularly find this interesting because I’ve been researching some of Microsoft’s Cognitive Services – and one of these services is Text to Speech. These services are not free – at the time of writing, it’s 5000 transactions free per month, and after that it’s $4 per 1000 transactions. This is pretty good value…but free is better. Also, I’ll show you that a lot less code is required for the offline app version of code – you can see the code that’s required to use the online API here.

So in this post, I’ll walk through the steps of how to get an app to talk to you.

Building the UI

First, open VS2015 and create a blank Windows 10 UWP.

screenshot.1460844693

When the app has been created successfully, I’d like to create a UI where the top two thirds of the screen are used for user-entered text, and the bottom third is a button which will make the device read the text entered.

I can do this by defining Grid rows using the code below – this splits the screen into two rows, with the top row being twice the size of the bottom row.

<TextBox
    Grid.Column="0" 
    Grid.Row="0" 
    HorizontalAlignment="Stretch" 
    VerticalAlignment="Stretch"
    Width="Auto" 
    Height="Auto" 
    Name="textToSpeak"
    AcceptsReturn="True"
    Text="Enter text here."/>
<Button 
    Grid.Column="0" 
    Grid.Row="1" 
    HorizontalAlignment="Stretch" 
    VerticalAlignment="Stretch" 
    Width="Auto" 
    Click="Speak_Click">
        Speak
</Button>

Finally, we need to enter the magic element – the media element.

<MediaElement Name="media"  Visibility="Collapsed"/>

That’s the XAML part of the project completed.

Writing the code

The code is written to trigger speech synthesis of what’s in the text box when the button is clicked. It’s pretty simple code – we instantiate the SpeechSynthesizer object in the page constructor, and the call a Talk method. This asynchronous method converts the text to a Speech Synthesis Stream, and then sets the source of the Media element to be this stream. Once that’s set, we can call the Play method of the Media element to hear the computer talk.

using System;
using Windows.Media.SpeechSynthesis;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
 
namespace SpeakingApp
{
    public sealed partial class MainPage : Page
    {
        SpeechSynthesizer speechSynthesizer;
 
        public MainPage()
        {
            InitializeComponent();
            speechSynthesizer = new SpeechSynthesizer();
        }
 
        private void Speak_Click(object sender, RoutedEventArgs e)
        {
            Talk(textToSpeak.Text);
        }
 
        private async void Talk(string message)
        {
            var stream = await speechSynthesizer.SynthesizeTextToStreamAsync(message);
            media.SetSource(stream, stream.ContentType);
            media.Play();
        }
    }
}

And that’s it – very simple code to allow your app to talk to the user. I hope you find this helpful.