.net, Accessibility, Speech Recognition, UWP

Speech recognition with C# and the Raspberry Pi

Last time, I wrote about how to use the UWP and integrate Cortana to use voice commands to start your app on a Windows Phone device.

This time, I’m going to write about how to control a Raspberry Pi with voice commands, and program your UWP app in C# to response to those instructions. This has the potential to really transform the accessibility of driving events in your UWP apps.

Creating the grammar specification file

The .NET framework provides some pretty advanced speech recognition capabilities out of the box – these APIs make integrating grammar specifications into your app very simple. The more complex part is creating the grammar file itself.

Microsoft have an excellent introduction to creating these files on MSDN here. Reading MSDN and augmenting this with the example on Wikipedia here really helped me get started with this.

I’ve started creating my Speech Recognition Grammar Specification (SRGS), which describes “automationCommands” below:

<?xml version="1.0" encoding="utf-8" ?>
<grammar
  version="1.0"
  xml:lang="en-US"
  root="automationCommands"
  xmlns="http://www.w3.org/2001/06/grammar"
  tag-format="semantics/1.0">
  
  <!-- SRGS instructions here -->
 
</grammar>

For the purposes of this article, I want my Raspberry Pi to recognise verbal instructions to control a vehicle. I’m likely to command the vehicle to move forward or backward, and I want to use a few different verbs to describe the action of movement. For example, I want the commands below to work:

  • Move forward
  • Go forwards
  • Turn back

It’s quite easy to see the structure of the sentence, in that there’s a verb which describes the move action (move, go, turn) and then an adverb for the direction (forward, forwards, backward, backwards, back). Therefore, our grammar specification starts to look like this:

<rule id="automationCommands">
  <item>
    <item>
      <ruleref uri="#moveAction" />
      <tag> out.command=rules.latest(); </tag>
    </item>
    <item>
      <ruleref uri="#direction" />
      <tag> out.direction=rules.latest(); </tag>
    </item>
  </item>
</rule>

When the .NET speech recognition engine interprets the voice commands, it will store the instruction it hears within a dictionary object, with keys of “command” and “direction” – you can see these in the <tag> nodes above.

So I now need to describe the rules for the automation commands “moveAction” and “direction”. Let’s look at “moveAction” first.

When the recognition engine hears me say the words “move”, “go” or “turn”, I want the engine to recognise this as an instruction to move. I would like to translate all of these verbal instructions to just one verb – move. This is much better than having to program my application to handle many different words (move, turn, go) which describe the same action (move). I can do this by defining a <tag> within a rule for one of a number of different words, in the way shown below.

<rule id="moveAction">
  <one-of>
    <item>
      <tag> out="MOVE"; </tag>
      <one-of>
        <item>move</item>
        <item>turn</item>
        <item>go</item>
      </one-of>
    </item>
  </one-of>
</rule>

For the rule relating to “direction”, this follows a similar pattern, but this rule has two output tags for forward and backward.

<rule id="direction">
  <item>
    <one-of>
      <item>
        <tag> out="FORWARD"; </tag>
        <one-of>
          <item>forward</item>
          <item>forwards</item>
        </one-of>
      </item>
      <item>
        <tag> out="BACKWARD"; </tag>
        <one-of>
          <item>backward</item>
          <item>back</item>
          <item>backwards</item>
        </one-of>
      </item>
    </one-of>
  </item>
</rule>

So the whole SRGS file – defining the grammar required is shown below. This is also available on Github here.

<?xml version="1.0" encoding="utf-8" ?>
<grammar
  version="1.0"
  xml:lang="en-US"
  root="automationCommands"
  xmlns="http://www.w3.org/2001/06/grammar"
  tag-format="semantics/1.0">
 
  <rule id="automationCommands">
    <item>
      <item>
        <ruleref uri="#moveAction" />
        <tag> out.command=rules.latest(); </tag>
      </item>
      <item>
        <ruleref uri="#direction" />
        <tag> out.direction=rules.latest(); </tag>
      </item>
    </item>
  </rule>
 
  <rule id="moveAction">
    <one-of>
      <item>
        <tag> out="MOVE"; </tag>
        <one-of>
          <item>move</item>
          <item>turn</item>
          <item>go</item>
        </one-of>
      </item>
    </one-of>
  </rule>
 
  <rule id="direction">
    <item>
      <one-of>
        <item>
          <tag> out="FORWARD"; </tag>
          <one-of>
            <item>forward</item>
            <item>forwards</item>
          </one-of>
        </item>
        <item>
          <tag> out="BACKWARD"; </tag>
          <one-of>
            <item>backward</item>
            <item>back</item>
            <item>backwards</item>
          </one-of>
        </item>
      </one-of>
    </item>
  </rule>
</grammar>

Implementing the UWP app in C#

I created a new Windows 10 UWP app in Visual Studio, and added a project reference to the Windows IoT Extensions for the UWP (shown below).

screenshot.1462650773

I also added a NuGet reference to a package I created to simplify coding for speech recognition – Magellanic.Speech.Recognition. I added it using the command below from the package manager console.

Install-Package Magellanic.Speech.Recognition -Pre

Next, I added handlers for the Loaded and Unloaded events in the app’s MainPage.xaml.cs file.

public MainPage()
{
    this.InitializeComponent();
 
    Loaded += MainPage_Loaded;
 
    Unloaded += MainPage_Unloaded;
}

I added the SRGS XML file to the root of the project with the name grammar.xml, and added a member reference to this and the speech recognition manager to MainPage.xaml.cs.

private const string grammarFile = "grammar.xml";
        
private SpeechRecognitionManager recognitionManager;

Inside the event handler “MainPage_Loaded”, I added the code below. This compiles the SGRS grammar file, and also adds an event handler for what to do when the speech recognition engine successfully detects and parses a voice command.

// initialise the speech recognition manager
recognitionManager = new SpeechRecognitionManager(grammarFile);
 
// register the event for when speech is detected
recognitionManager
    .SpeechRecognizer
    .ContinuousRecognitionSession
    .ResultGenerated += RecognizerResultGenerated;
 
// compile the grammar file
await recognitionManager.CompileGrammar();

The code below shows the implementation of the event handler declared above. I’ve chosen to ignore any results which aren’t recognised with a high level of confidence. You can also see how the two keys of “command” and “direction” – which are defined in the “automationCommands” rule in the SRGS – can be interpreted and used in C# for further processing and action.

private void RecognizerResultGenerated(
    SpeechContinuousRecognitionSession session,
    SpeechContinuousRecognitionResultGeneratedEventArgs args)
{
    // only act if the speech is recognised with high confidence
    if (!args.Result.IsRecognisedWithHighConfidence())
    {
        return;
    }
 
    // interpret key individual parts of the grammar specification
    string command = args.Result.SemanticInterpretation.GetInterpretation("command");
    string direction = args.Result.SemanticInterpretation.GetInterpretation("direction");
 
    // write to debug
    Debug.WriteLine($"Command: {command}, Direction: {direction}");
}

The code for MainPage.xaml.cs is available here.

Hardware used by the Raspberry Pi

The Pi doesn’t have any hardware on board which can convert voice commands into electrical signal – I purchased a small USB microphone. The device is shown below.

The image below shows how the Raspberry Pi recognises this device as a USB PnP sound device.

screenshot.1467567420.png

Finally, in order to use this device, I had to modify the app’s Package.appxmanifest file to add the Microphone capability.

screenshot.1467568561

I’ve added all of this code to GitHub here.

Testing it out with some voice commands

I added a small LCD device to my Raspberry Pi to show the output of my voice commands. When I say “Move forward”, the device interprets it in the way below – the LCD screen shows how the command is “MOVE” and the direction is “FORWARD”.

voice_recognition_small

When I say “Turn back”, the device interprets it in the way below.  The image shows how the command is “MOVE” and the direction is “BACKWARD”. So notice how the device doesn’t care about whether you say “move” or “turn”, it interprets it as the command “MOVE”.

voice_recognition_back_small.jpg

This has been a simple introduction to speech recognition in C#, and how to use it with the Raspberry Pi. You can obviously go to a much greater deal of complexity with the SRGS file to make your UWP applications more accessible.

.net, Accessibility, Cortana, UWP, Windows Store Apps

How to integrate Cortana with a Windows 10 UWP app in C#

Over the last few weeks, I’ve been writing a lot about how to use C# with the Raspberry Pi. I’m really interested in different ways that I can use software to interact with the physical world. Another interaction that I’m interested in is using voice commands, and recently I started looking into ways to use Cortana to achieve this. This post is an introduction to asking Cortana to control Windows apps.

In this post, I’ll look at the simple case of setting up a Windows app so that I can ask Cortana to start the app from my phone.

How does Cortana know what to listen for?

There is some seriously advanced technology in the Microsoft Cognitive Services, particularly software like LUIS – but for this simple case, I’ll store the voice commands Cortana listens for in an XML Voice Command Definition (VCD) file.

  • First we need to define a CommandSet – this has name and language attributes. The voice commands will only work for the CommandSet which has a language attribute matching that on the Windows 10 device. So if your Windows device has the language set to en-us, only the CommandSet matching that attribute will be used by Cortana.
  • We also can define an alternative name for the app as a CommandPrefix.
  • To help the user, we can provide an Example command.
  • The most interesting node in the file is Command:
    • Example: Windows shows examples for each individual command, and this node is where we can specify the examples.
    • ListenFor: These are the words Cortana listens for.
    • Feedback: This is what Cortana replies with.
    • Navigate: This is the XAML page that Cortana navigates to when it parses what you’ve said.

The app I’ve modified is my Electronic Resistance Calculator. I’ve added the file below – which I’ve named ‘ResistorCommands.xml’ – to the root of this directory.

<?xml version="1.0" encoding="utf-8" ?>
<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.2">
  <CommandSet xml:lang="en-us" Name="EnglishCommands-us">
    <!-- The CommandPrefix provides an alternative name for your app -->
    <CommandPrefix>Resistor</CommandPrefix>
    <!-- The CommandSet Example appears beside your app's name in the global help -->
    <Example>Open</Example>
    <Command Name="OpenCommand">
      <Example>Open</Example>
      <ListenFor>Open</ListenFor>
      <Feedback>You got it!</Feedback>
      <Navigate Target="MainPage.xaml" />
    </Command>
  </CommandSet>
 
  <CommandSet xml:lang="en-gb" Name="EnglishCommands-gb">
    <!-- The CommandPrefix provides an alternative name for your app -->
    <CommandPrefix>Resistor</CommandPrefix>
    <!-- The CommandSet Example appears beside your app's name in the global help -->
    <Example>Open</Example>
    <Command Name="OpenCommand">
      <Example>Open</Example>
      <ListenFor>Open</ListenFor>
      <Feedback>I'm on it!</Feedback>
      <Navigate Target="MainPage.xaml" />
    </Command>
  </CommandSet>
</VoiceCommands>

Adding these voice commands to the Device Definition Manager

The Windows 10 VoiceCommandDefinitionManager is the resource that Cortana uses when trying to interpret the voice commands. It’s very straightforward to get the Voice Command Definition file from application storage, and then install this storage file into the VoiceCommandDefinitionManager.

We need to add those definitions at application start up, which we can do by overriding the OnNavigatedTo method in MainPage.xaml.cs.

private async Task AddVoiceCommandDefinitionsAsync()
{
    var storageFile = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///ResistorCommands.xml "));
    await VoiceCommandDefinitionManager.InstallCommandDefinitionsFromStorageFileAsync(storageFile);
}
        
protected override async void OnNavigatedTo(NavigationEventArgs e)
{
    if (e.NavigationMode == NavigationMode.New)
    {
        await AddVoiceCommandDefinitionsAsync();
    }
}

At this point, we actually have enough code to allow us to ask Cortana to start our app.

Running the app on a Windows 10 device

I added the VCD ResistorCommands.xml file to the root of the Electronic Resistance Calculator project, and I added the code snippet above to MainPage.xaml.cs, and ran this in debug mode on my Nokia 1520 Windows 10 device.

When I activate Cortana, I can click on the hamburger menu and select Help in the top left to see the list of apps which are controlled by voice commands. My Electronic Resistance Calculator is available – you can see in the screenshot below that the word “Open” as an example voice command is visible.

wp_ss_20160630_0003

If I click on the Resistor app, the phone shows a list of valid example commands. Because we’re just opening the app, there’s just one example – “Open”. Obviously we can do more complex things than this with a VCD, which I’ll show in a later post.

wp_ss_20160630_0005

When I say “Resistor Show”, Cortana recognises this and replies with “I’m on it” – the feedback specified for devices set to have language “en-gb” (which is correct for my device). After a short pause, the app starts.

wp_ss_20160630_0004

In a later post, I’ll look at how to use the VCD to issue more complex voice commands.

.net, Accessibility, C# tip, UWP

How to use C# and the Windows.Media.SpeechSynthesis library to make your UWP app talk

This is a short post, on the topic of building speech enabled UWP apps for the Windows Store.

The features available through the Universal Windows Platform are pretty interesting – and also pretty incredible, when you consider you get these APIs for free as long as you’re building an app. One of these features is speech synthesis.

I particularly find this interesting because I’ve been researching some of Microsoft’s Cognitive Services – and one of these services is Text to Speech. These services are not free – at the time of writing, it’s 5000 transactions free per month, and after that it’s $4 per 1000 transactions. This is pretty good value…but free is better. Also, I’ll show you that a lot less code is required for the offline app version of code – you can see the code that’s required to use the online API here.

So in this post, I’ll walk through the steps of how to get an app to talk to you.

Building the UI

First, open VS2015 and create a blank Windows 10 UWP.

screenshot.1460844693

When the app has been created successfully, I’d like to create a UI where the top two thirds of the screen are used for user-entered text, and the bottom third is a button which will make the device read the text entered.

I can do this by defining Grid rows using the code below – this splits the screen into two rows, with the top row being twice the size of the bottom row.

<TextBox
    Grid.Column="0" 
    Grid.Row="0" 
    HorizontalAlignment="Stretch" 
    VerticalAlignment="Stretch"
    Width="Auto" 
    Height="Auto" 
    Name="textToSpeak"
    AcceptsReturn="True"
    Text="Enter text here."/>
<Button 
    Grid.Column="0" 
    Grid.Row="1" 
    HorizontalAlignment="Stretch" 
    VerticalAlignment="Stretch" 
    Width="Auto" 
    Click="Speak_Click">
        Speak
</Button>

Finally, we need to enter the magic element – the media element.

<MediaElement Name="media"  Visibility="Collapsed"/>

That’s the XAML part of the project completed.

Writing the code

The code is written to trigger speech synthesis of what’s in the text box when the button is clicked. It’s pretty simple code – we instantiate the SpeechSynthesizer object in the page constructor, and the call a Talk method. This asynchronous method converts the text to a Speech Synthesis Stream, and then sets the source of the Media element to be this stream. Once that’s set, we can call the Play method of the Media element to hear the computer talk.

using System;
using Windows.Media.SpeechSynthesis;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
 
namespace SpeakingApp
{
    public sealed partial class MainPage : Page
    {
        SpeechSynthesizer speechSynthesizer;
 
        public MainPage()
        {
            InitializeComponent();
            speechSynthesizer = new SpeechSynthesizer();
        }
 
        private void Speak_Click(object sender, RoutedEventArgs e)
        {
            Talk(textToSpeak.Text);
        }
 
        private async void Talk(string message)
        {
            var stream = await speechSynthesizer.SynthesizeTextToStreamAsync(message);
            media.SetSource(stream, stream.ContentType);
            media.Play();
        }
    }
}

And that’s it – very simple code to allow your app to talk to the user. I hope you find this helpful.

.net, Accessibility, Non-functional Requirements, Visual Studio, Visual Studio Plugin, Web Development

How to use the Web Accessibility Checker for Visual Studio to help conform to accessibility guidelines

I’ve previously blogged about assessibility a few times and I’d love to find a good way to identify accessibility issues from my development environment. So I was really interested to see that recently Mads Kristensen from Microsoft released the Web Accessibility Checker for Visual Studio 2015. This extension uses the aXe-core library for analysing code in Visual Studio.

The Visual Studio Gallery gives some good instructions on how to install and use this extension. It’s a pretty straightforward install – once you run your website, a list of non-conformances will appear in the Error List in VS 2015 (to see the Error List, go to the View Menu and select Error List from there).

Obviously this can’t identify every accessibility problem on your site, so fixing all the errors on this list isn’t going to guarantee your website is accessible. But one of the manifesto items from aXe-core’s github page states the tool aims to report zero false positives – so if aXe-core is raising an error, it’s worth investigating.

Let’s look at an example.

How does it report errors?

I’ve written some HTML code and pasted it below…ok, it’s some pretty ropey HTML code, with some really obvious accessibility issues.

<!DOCTYPE html>
<html>
<body>
    <form>
        This is simple text on a page.
 
        Here's a picture:
        <br />
        <img src="/image.png" />
        <br />
        And here's a button:
        <br />
        <button></button>
    </form>
</body>
</html>

 

Let’s see what the Web Accessibility Checker picks up:

screenshot.1460325884

Four errors are reported:

  • No language attribute is specified in the HTML element. This is pretty easy to fix – I’ve blogged about this before;
  • The <button> element has no text inside it;
  • The page has no <title> element.
  • The image does not have an alternative text attribute.

Note – these errors are first reported at the application runtime, don’t expect to see them when your writing your code, or just after compiling it.

If you want to discover more about any of these errors, the Error List has a column called “Code”, and clicking the text will take you to an explanation of what the problem is.

In addition, you can just double click on the description, and the VS editor focus will move to the line of code where the issue is.

I’ve corrected some of the errors – why are they still in the Error List?

I found that the errors stayed in the list, even after starting to fix the issues. In order to clear the errors away, I found that I needed to right click on the Error List, and from the context menu select “Clear All Accessibility Errors”.

screenshot.1460326466

When I hit refresh on my browser, and I was able to see the remaining issues without it showing the ones that I had fixed.

What more does this give me when compared to some of the existing accessibility tools?

Previously I’ve used tools like the HTML_CodeSniffer bookmarklet, which also report accessibility errors.

screenshot.1460326977

This is a great tool, but it will only point to the issues on the web page – the Web Accessibility Checker in VS2015 has the advantage of taking your cursor straight to the line of source code with the issue.

Conclusion

Obviously you can’t completely test if a website is accessible using automated tools. But you can definitely use tools to check if certain rules are being adhered to in your code. Tools like the Web Accessibility Checker for VS2015 help you identify and locate accessibility issues in your code – and when it’s free, there’s no reason not use it in your web application development process today.

Accessibility, Non-functional Requirements

Accessibility and images – is it ever ok to not specify alternative text?

It’s good practice to specify alternative text for images using the “alt” attribute – although it’s unfortunately common to see images without it.

To answer the question in the title – you must always specify alternative text for images.

But as with anything on the web, it’s possible to find conflicting arguments and information about this practice – so to try to improve the credibility of the recommendations in this post, I frequently refer back to W3C.org.

Almost every image on a web page should have some text placed in the “alt” tag (or the long desc) – and this is a Priority 1 item on the W3C checklist.

Examples:
1. Images used as illustrative content, e.g.

<img src="me.jpg" alt="The article's author" />

2. Images used for spacers and bullets – yes, these are mandatory! Even though the image doesn’t add any real content, you can still specify alternative text, e.g.

<img src="bullet.gif" alt="* " />

3. Images used as links – there are a couple of different ways of doing this, which must be handled in different ways. if you provide no link text and the only content of the <a> link is the image, use the “alt” tag to specify a text equivalent, e.g.

<a href="home.html"><img src="home.gif" alt="Home page"/></a>

In the case where both an image and text are specified as the content of a link, repeating the anchor’s text in the “alt” attribute is unnecessary – W3C mandate using a space in this instance, e.g.

<a href="home.html"><img src="home.gif" alt=" "/>Home</a>

So the “alt” tag still has some text, even though it’s a space – you mustn’t omit the “alt” tag.

What about background images in CSS?

I haven’t been able to find information on W3C about how alternative text for background images should be treated. Christian Heilmann argues that images in CSS should be purely aesthetic, and therefore don’t need alternative text. I definitely agree with his argument – your page still needs needs to make sense with CSS switched off.

But what if you are supporting a site where someone has put background images with semantic value into the CSS? Well, you could change the code to bring this data into an <img> tag, and specify an “alt” value in your HTML rather than CSS. But sometimes this might not be possible – in this case, the Yahoo Developer Network recommends using ARIA-roles, which enables screen-readers to recognise your ARIA enhanced element as an image.

<div role="img" aria-label="The article's author">

I hope this article helps you improve the accessibility of your site.

Accessibility, Non-functional Requirements

Accessibility – specifying the language of your page

I’m going to write a few posts on specific things you can do to improve the accessibility of a webpage.

You can identify the primary natural language of a document by making a simple change to your HTML element:

<HTML lang="en">

That’s it – now your site complies with Checkpoint 4.3 of the W3C Accessibility checklist by identifying the primary language of a document.

This very simple tip improves the accessibility of your page – it’s presently Priority 3, so if only mandatory if you’re aiming at AAA compliance. But it’s good practice, a small change and low risk – why not add it to your master page?

Accessibility, Continuous Integration, Non-functional Requirements

Accessibility and Continuous Integration

There are some great tools out there already to test if your page conforms to accessibility standards – HTML_CodeSniffer is one of the best I’ve seen – but I can’t run this as part of my automated CI process.

There are some tools that allow you to submit a URL, such as the WAVE tool on the WebAIM site, but if you’re developing a site on an intranet, or you’re working with confidential data, this isn’t useful either.

From the W3C WAI Tools list, I discovered AccessLint, which audits your code using javascript from Google’s Accessibility Developer Tools. This posting is a quick recipe for how to run this against a web page from the command line using Windows.

  1. Download PhantomJS, extract to somewhere on your hard drive, and add the binary’s path to your environment variables.
    • Grab Phantom JS from here.
    • I extracted the zip file to C:\PhantomJS_2.0.0 so the actual PhantomJS.exe sits in C:\PhantomJS_2.0.0\bin. I created an environment variable called PHANTOM_JS, and then added “%PHANTOM_JS\bin” to the end of my PATH.
  2. Download the installer for Ruby and run it.
  3. Download the Access Lint code.
    • Grab the Access Lint code from here. Pull using Git, or download the zip – either way works.
    • I have the code in C:\Access_Lint so I can see the access_lint ruby file in C:\Access_Lint\bin.
  4. Install the rubygem.
    • Open a command prompt, browse to where the access_lint ruby file is saved (as above, I have it in C:\Access_Lint\bin), and enter:
gem install access_lint

And we’re ready to go!

Now you can open a command prompt, and enter a command like:

access_lint audit http://w3.org

The audit output will render to the command prompt window as JSON.

You can now check the accessibility of a web page on your integration server as part of a CI process.

Criticisms

The process isn’t perfect.

  • To test each page, you’d have to audit it individually – it would be better if we could crawl the site. (We could work around this by running a batch of audit commands, specifying each page to be audited in a separate line).
  • The JSON output has some odd artefacts – it uses “=>” instead of “:”, and the first and last lines in the file are console logs. (We could work around this by doing some simple post processing on the output).
  • JSON isn’t particularly readable. (We could work around this by using the JSON as a data source, and using another tool to render results in a more readable format)
  • And most significantly, if this tool doesn’t report failures, it doesn’t mean your page is accessible. (No real workarounds beyond manually checking the page.)

But this is the starting point on a journey. Accessibility can sometimes be an afterthought. It should be a natural part of development and process, and part of a team’s definition of done.