Computer Vision, Kinect

Getting started with Kinect for Windows – installing, troubleshooting and running samples

As I have been writing my series on using sensors with Windows and .NET, it occurred to me that I actually had a pretty amazing set of sensors sitting unused – the Kinect device from my XBox One.

I don’t have any games that really use the XBox One, and with some of the refinements to the console operating system, I’ve found it to be easier to use the gamepad than to use voice commands.

I knew that the Kinect for the XBox 360 wasn’t compatible with my PC, and that there’s actually a separate Kinect for Windows device – but after a little research, I found that the Kinect for the XBox One could work with a PC, as long as it was connected through an adapter.

I want to write about my experiences – as usual, things didn’t work quite as smoothly as I’d have liked! Hopefully my experiences will be useful to someone else out there.

Installing the Kinect Software SDK

As I mentioned above, my Kinect wouldn’t connect to a PC directly, it needed to go through an adapter. There were some other system requirements listed at this link.

If I were starting this process again, I would install the SDK (available here) before purchasing the Kinect Adapter. This ships is a tool which analyses your PC for compatibility, and would have told me if my PC wasn’t good enough. Fortunately it was good enough, but it could have been an expensive mistake if I bought the adapter and found my machine wasn’t up to the job.

Step 1: Download the Kinect for Windows SDK 2.0

The SDK is freely available for download from this link.

When the SDK has downloaded, you double click on the executable to start the installation process. The first thing you’ll be challenged to do is to accept the licence agreement and click Install.

screenshot.1468005312

The install is pretty straightforward, and if it successfully installs it will finish on the screen below:

screenshot.1468005367

At this point, it’s possible to load up the verification tool – there will be a new Windows app called “SDK Browser 2.0 (Kinect for Windows)”. You can search for this through the Windows Start menu. If you start this up, you’ll see a screen something like the one below.

screenshot.1468006060

You can see at the top of the list, there’s a component named “Kinect Configuration Verifier”, which has a “Run” button on the right hand side. If you click on Run, you’ll see a screen like the one below.

screenshot.1468153097

After a few seconds, this should change to a screen like the one below:

screenshot.1468153091

Hopefully your machine will have green ticks against everything – in my system, I don’t have a Kinect connected yet – therefore the “Kinect Connected” test has failed, as has the “Verify Kinect Depth and Color Streams” test.

Step 2: Connecting the Kinect Device

I set up my Kinect device in the configuration shown in the Microsoft site, also displayed below. I plugged this into the electrical power, and also connected the USB cable into the USB 3.0 port on my PC.

Using USB 3.0 is really important – I also tried it with a USB 2 socket, and this didn’t work.

en-intl-l-microsoft-kinect-for-win-plug-9j7-00001-rm3-mnco

There were some alerts as Windows installed the drivers for the Kinect, but I was able to check that the Kinect had installed correctly by looking at the Device Manager. There is a new node for Kinect sensor devices, shown below:

screenshot.1468005635

Under the Sound, video and game controllers now there was a new item called Xbox NUI Sensor, shown below:

screenshot.1468005736

And finally, under Audio inputs and outputs, there is a new item called “Microphone Array (Xbox NUI Sensor)”.

screenshot.1468005692

After this point, I re-ran the verification tool, expecting different results as the Kinect sensor was connected to the machine. The results are shown below.

screenshot.1468006208

Unfortunately one of the tests failed – “Verify Kinect Depth and Color Streams”. This is very strange, as when I expanded the item to see more details, I was clearly able to see the output from the Kinect sensor, with a frame-rate varying between 20 – 30 FPS.

Other people have not been so lucky. I’ve included some links below which might be helpful.

Troubleshooting and Common Issues Guide

Error: Verify Kinect Depth and Color Streams

Red Mark on “Graphic Processor” and “Verify Kinect Depth and Color Streams”

Testing the Kinect out with some of the samples

Obviously I’d some concerns that my Kinect wasn’t going to work given that one of the verification tests had failed, but I pressed on with some experiments.

Face Basics-WPF

The first application I tested was “Face Basics-WPF”. This demonstrates how to use the FaceFrameReader to obtain information about the faces that the Kinect sees.

After running the application, I found that my face was detected if I stood back a couple of metres from the sensor – I’ve shown the output below. Basically everything detected in the image is correct – although it’s a shame it didn’t detect me as being happy! This value changed depending on whether I was smiling or not, which shows the level of detail that the Kinect and its software is able to pick up.

screenshot.1468006438

I found that this application was installed on my machine in the location:

C:\Program Files\Microsoft SDKs\Kinect\v2.0_1409\Samples\Managed\FaceBasics-WPF

Discrete Gesture Basics-WPF

The next application I tested was the “Discrete Gesture Basics-WPF” application. This uses the VisualGestureBuilderFrame object to detect people in front of the sensor, and also to track gestures from these people. In the screenshot below, I’m standing a couple of metres in front of the Kinect, and my left hand is open and my right hand is closed (and you can see the different way each hand is displayed, the open hand is green and the closed hand is red). It’s a pretty good recognition of a person standing.

screenshot.1468006948

When I moved closed to the Kinect, and sat down in front of it, the Kinect correctly saw that I was seated, but I was too close to the Kinect for it to display a useful representation.

screenshot.1468006984

Again, I found this application installed to my hard disk at the location below:

C:\Program Files\Microsoft SDKs\Kinect\v2.0_1409\Samples\Managed\DiscreteGestureBasics-WPF

Conclusion

I’ve successfully tested the Kinect sensor for the XBox One with a PC, and found the process to get it working was actually really straightforward. I had to purchase a Kinect Adapter which cost my about £33 (in UK pounds sterling) to get this to work. I’m looking forward to starting some more development projects involving computer vision and speech recognition.

.net, Computer Vision, UWP, Visual Studio, Windows Store Apps

How to use the camera on your device with C# in a UWP application: Part #3, saving a picture

Previously in this series, we looked at how to preview your device’s camera output, and how to use a physical button to focus the camera.

This time I’d like to look at how to capture an image, and store it in a local device folder.

Adding the capability to save to the pictures folder

If you want to save pictures to one of the many standard Windows folders, you need to add this capability to the package manifest. In the VS2015 project which we’ve been building over the last two parts of this series, double click on the Package.appxmanifest file. In the list of capabilities, tick the box with the text “Pictures Library”.

screenshot.1461274147

Our application is now allowed to save to the Pictures library on our device.

Capture an image using the device button

In part 2, we set up our app to make the camera focus when the button is half pressed – after it has focussed, we’d like to fully press the button to capture the image presently being previewed. To do this, we need to handle the CameraPressed event in our code.

if (ApiInformation.IsTypePresent("Windows.Phone.UI.Input.HardwareButtons"))
{
    HardwareButtons.CameraHalfPressed += HardwareButtons_CameraHalfPressed;
    HardwareButtons.CameraPressed += HardwareButtons_CameraPressed;
}

The next step is to write the event handler.

Writing to “Known Folders”

The Windows UWP API has some functions already baked in that allow us to identify special folders in Windows, and save files to these folders.

To get these special folders, we use the static class “KnownFolders”. For each of these known folders, there are methods available to create files. These created files implement the IStorageFile interface – and fortunately, the _mediaCapture has a method called CapturePhotoToStorageFileAsync, which allows us to save an image to a file which implements this interface. The code below for the event handler shows how it’s done.

private async void HardwareButtons_CameraPressed(object sender, CameraEventArgs e)
{
    // This is where we want to save to.
    var storageFolder = KnownFolders.SavedPictures;
 
    // Create the file that we're going to save the photo to.
    var file = await storageFolder.CreateFileAsync("sample.jpg", CreationCollisionOption.ReplaceExisting);
 
    // Update the file with the contents of the photograph.
    await _mediaCapture.CapturePhotoToStorageFileAsync(ImageEncodingProperties.CreateJpeg(), file);
}

So now we have a basic Windows application, which acts as a viewfinder, allows us to focus if the device is capable, and then allows us to save the presently displayed image to the special Windows SavedPictures folder. This is a pretty good app – and we’ve done it in about 100 lines of code (shown below). Not bad!

using System;
using System.Linq;
using System.Threading.Tasks;
using Windows.Devices.Enumeration;
using Windows.Foundation.Metadata;
using Windows.Media.Capture;
using Windows.Media.Devices;
using Windows.Media.MediaProperties;
using Windows.Phone.UI.Input;
using Windows.Storage;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Navigation;
 
namespace BasicCamera
{
    public sealed partial class MainPage : Page
    {
        // Provides functionality to capture the output from the camera
        private MediaCapture _mediaCapture;
 
        public MainPage()
        {
            InitializeComponent();
 
            Application.Current.Resuming += Application_Resuming;
 
            if (ApiInformation.IsTypePresent("Windows.Phone.UI.Input.HardwareButtons"))
            {
                HardwareButtons.CameraHalfPressed += HardwareButtons_CameraHalfPressed;
                HardwareButtons.CameraPressed += HardwareButtons_CameraPressed;
            }
        }
 
        private async void Application_Resuming(object sender, object o)
        {
            await InitializeCameraAsync();
        }
 
        protected override async void OnNavigatedTo(NavigationEventArgs e)
        {
            await InitializeCameraAsync();
        }
 
        private async Task InitializeCameraAsync()
        {
            if (_mediaCapture == null)
            {
                // Get the camera devices
                var cameraDevices = await DeviceInformation.FindAllAsync(DeviceClass.VideoCapture);
 
                // try to get the back facing device for a phone
                var backFacingDevice = cameraDevices
                    .FirstOrDefault(c => c.EnclosureLocation?.Panel == Windows.Devices.Enumeration.Panel.Back);
 
                // but if that doesn't exist, take the first camera device available
                var preferredDevice = backFacingDevice ?? cameraDevices.FirstOrDefault();
 
                // Create MediaCapture
                _mediaCapture = new MediaCapture();
 
                // Initialize MediaCapture and settings
                await _mediaCapture.InitializeAsync(
                    new MediaCaptureInitializationSettings
                    {
                        VideoDeviceId = preferredDevice.Id
                    });
 
                // Set the preview source for the CaptureElement
                PreviewControl.Source = _mediaCapture;
 
                // Start viewing through the CaptureElement 
                await _mediaCapture.StartPreviewAsync();
            }
        }
 
        private async void HardwareButtons_CameraHalfPressed(object sender, CameraEventArgs e)
        {
            // test if focus is supported
            if (_mediaCapture.VideoDeviceController.FocusControl.Supported)
            {
                // get the focus control from the _mediaCapture object
                var focusControl = _mediaCapture.VideoDeviceController.FocusControl;
 
                // try to get full range, but settle for the first supported one.
                var focusRange = focusControl.SupportedFocusRanges.Contains(AutoFocusRange.FullRange) ? AutoFocusRange.FullRange : focusControl.SupportedFocusRanges.FirstOrDefault();
 
                // try to get the focus mode for focussing just once, but settle for the first supported one.
                var focusMode = focusControl.SupportedFocusModes.Contains(FocusMode.Single) ? FocusMode.Single : focusControl.SupportedFocusModes.FirstOrDefault();
 
                // now configure the focus control with the range and mode as settings
                focusControl.Configure(
                    new FocusSettings
                    {
                        Mode = focusMode,
                        AutoFocusRange = focusRange
                    });
 
                // finally wait for the camera to focus
                await focusControl.FocusAsync();
            }
        }
 
        private async void HardwareButtons_CameraPressed(object sender, CameraEventArgs e)
        {
            // This is where we want to save to.
            var storageFolder = KnownFolders.SavedPictures;
 
            // Create the file that we're going to save the photo to.
            var file = await storageFolder.CreateFileAsync("sample.jpg", CreationCollisionOption.ReplaceExisting);
 
            // Update the file with the contents of the photograph.
            await _mediaCapture.CapturePhotoToStorageFileAsync(ImageEncodingProperties.CreateJpeg(), file);
        }
    }
}

Of course, there’s still a bit more to be done – this code doesn’t handle resource clean-up, or deal with what happens when the application is suspended or loses focus. We’ll look at that next time.

.net, Computer Vision, UWP, Visual Studio, Windows Store Apps

How to use the camera on your device with C# in a UWP application: Part #2, how to focus the preview

In the previous part of the series, we looked at how to preview your device’s camera output.

This part is about how to focus the device using C#. Not all devices will be capable of focussing – for example, a normal laptop webcam won’t be able to focus, but a Nokia 1520 can focus. Fortunately, we don’t need to guess – testing support for focussing is part of the API provided for Windows UWP apps. We can test this by using the “_mediaCapture” object, which we created in the code shown in Part #1.

if (_mediaCapture.VideoDeviceController.FocusControl.Supported)
{
    // Code here is executed if focus is supported by the device.
}

On my phone,  I’d like to use the camera button when it’s half-pressed to focus the image. I’m able to do this in a UWP app, but I need to add a reference to a UWP library first first.

Setting up mobile extension references

In the solution view in VS2015, right click on the “References” node, and select “Add Reference…”.

screenshot.1461183352

The window that appears is called the “Reference Manager”. On the left hand menu, expand the “Universal Windows” node, and select “Extensions”. In the list of extensions, tick the box for “Windows Mobile Extensions for the UWP”. Now click OK.

screenshot.1461183496

Testing for hardware buttons on the device, and handling events

Obviously enough, we’ve now added a reference to a library which allows you to test for the availability of certain sensors which are specific to a mobile device, such as the hardware button used to take a picture.

if (ApiInformation.IsTypePresent("Windows.Phone.UI.Input.HardwareButtons"))
{
    // This code will only run if the HardwareButtons type is present.
}

The Camera button has three events – CameraPressed, CameraHalfPressed, and CameraReleased. I’m interested in intercepting the CameraHalfPressed event for focussing, so I’ve assigned the event handler in the code below, and put this in the constructor for the MainPage class.

if (ApiInformation.IsTypePresent("Windows.Phone.UI.Input.HardwareButtons"))
{
    HardwareButtons.CameraHalfPressed += HardwareButtons_CameraHalfPressed;
}

The event handler is shown below, including the snippet of code to test if focussing is supported.

private void HardwareButtons_CameraHalfPressed(object sender, CameraEventArgs e)
{
    if (_mediaCapture.VideoDeviceController.FocusControl.Supported)
    {
        // Focussing code is here.
    }
}

Focus range and focus mode

To focus the camera device, I need to configure the focus control of the _mediaCapture object – this means getting the focus mode and focus range. We can get the supported ranges and modes from the focus control object, and then assign these as settings. Finally, we need to call the asynchronous focus method. The code below shows how this works.

private async void HardwareButtons_CameraHalfPressed(object sender, CameraEventArgs e)
{
    // test if focus is supported
    if (_mediaCapture.VideoDeviceController.FocusControl.Supported)
    {
        // Get the focus control from the _mediaCapture object.
        var focusControl = _mediaCapture.VideoDeviceController.FocusControl;
 
        // Try to get full range autofocus, but settle for the first supported range.
        var focusRange = focusControl.SupportedFocusRanges.Contains(AutoFocusRange.FullRange) ? AutoFocusRange.FullRange : focusControl.SupportedFocusRanges.FirstOrDefault();
 
        // Try to get the focus mode for focussing just once, but settle for the first supported one.
        var focusMode = focusControl.SupportedFocusModes.Contains(FocusMode.Single) ? FocusMode.Single : focusControl.SupportedFocusModes.FirstOrDefault();
 
        // Now configure the focus control with the range and mode as settings.
        focusControl.Configure(
            new FocusSettings
            {
                Mode = focusMode,
                AutoFocusRange = focusRange
            });
 
        // Finally wait for the camera to focus.
        await focusControl.FocusAsync();
    }
}

So again, only a few lines of code are needed to register a button press event, and then configure the focus control. Hopefully this helps someone trying to set up focussing.

In the next part, I’ll look at how to change our code to actually capture an image when we fully press the camera button.

.net, Computer Vision, UWP, Visual Studio

How to use the camera on your device with C# in a UWP application: Part #1, previewing the output

I’ve recently started writing some UWP applications, and I am really enjoying learning the challenges of WPF and app programming (admittedly I’ve come to the party pretty late on this).

I’ve decided to write a short series of posts on how to use the camera on Windows devices – my plan is to write articles covering:

  1. Previewing the camera output to the device’s screen;
  2. Adding the ability to focus;
  3. Allowing the user to capture an image;
  4. And finally add error handling and resource clean-up.

This first part will just be about writing an app that will preview the device’s camera output to the device’s screen.

Since I’m adding error handling in the final part of the series, this first part will assume that the device running this code has a camera connected.

Note: This series is meant to use the absolute minimum number of lines of code necessary. For much more functional samples, check out the sample UWP code released by Microsoft to Github.

Step 1: Create the project and set capabilities

In VS2015, create a new Windows 10 UWP “Blank App” project.

screenshot.1460844693

Once the project has been created, you need to open the Package.appmanifest file (which was created as part of the Blank App), and click on the Capabilities tab. You need to tick boxes for:

  • Microphone
  • Webcam

It took me a little while to understand why the microphone would be required because you don’t need the microphone to take a picture. The reason is that the camera on the device is actually a video recorder (which records sound and images) – in order to use this device in code, you need access to both hardware features.

Step 2: Add the XAML control to preview the camera output

The CaptureElement control renders a stream from a capture device, such as a device camera or a webcam. We need to add one of these controls to the MainPage.xaml file.

<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <CaptureElement Name="PreviewControl" Stretch="Uniform"/>
</Grid>

Step 3: Create a private asynchronous method to initialise the camera

This is where the main part of the application lives.

We need a member variable (a class property would be fine as well) – the MediaCapture control to allows us to see a preview of what the camera sees in the CaptureElement (and later, we’ll use this to capture the photograph).

// Provides functionality to preview and capture the photograph
private MediaCapture _mediaCapture;

We’ll need to initialise the camera asynchronously a few times, so we need a method to repeat this process:

  1. First this method needs to test if instantiation of the camera has already happened (specifically initialisation of the MediaCapture object) . If it hasn’t been initialised, then we need to go through the process.
  2. Next, we need to get a reference to the actual camera device. We’d prefer a back facing camera (usually the case for a phone) – but since this is a UWP and might run on a desktop as well, it’s possible that a back facing camera doesn’t exist. In that case, we’ll just take a reference to whatever the first camera device is.
  3. Once we have the camera, we’ll initialise the MediaCapture object, and initialise it by telling it the camera device identifier that we want it to use.
  4. Almost done – we’ll set the MediaCapture object to be the source of the CaptureElement object added to the Xaml earlier.
  5. Finally, tell the MediaCapture object to allow us to start previewing through the CaptureElement object.
private async Task InitializeCameraAsync()
{
    if (_mediaCapture == null)
    {                
        // Get the camera devices
        var cameraDevices = await DeviceInformation.FindAllAsync(DeviceClass.VideoCapture);
                
        // try to get the back facing device for a phone
        var backFacingDevice = cameraDevices
            .FirstOrDefault(c => c.EnclosureLocation?.Panel == Windows.Devices.Enumeration.Panel.Back);
 
        // but if that doesn't exist, take the first camera device available
        var preferredDevice = backFacingDevice ?? cameraDevices.FirstOrDefault();
 
        // Create MediaCapture
        _mediaCapture = new MediaCapture();
                
        // Initialize MediaCapture and settings
        await _mediaCapture.InitializeAsync(
            new MediaCaptureInitializationSettings {
                VideoDeviceId = preferredDevice.Id
            });
                
        // Set the preview source for the CaptureElement
        PreviewControl.Source = _mediaCapture;
                
        // Start viewing through the CaptureElement 
        await _mediaCapture.StartPreviewAsync();
    }
}

This is pretty much the most complicated part.

Step 4. Register and override app events

We need to capture when the application is starting and suspending to carry out initialisation actions.

We can register one of these events in the MainPage constructor.

public MainPage()
{
    InitializeComponent();
 
    Application.Current.Resuming += Application_Resuming;
}

Additionally, we need to override the events when we navigate to the application – the code below shows the methods that handle each of the two events.

private async void Application_Resuming(object sender, object o)
{
    await InitializeCameraAsync();
}
 
protected override async void OnNavigatedTo(NavigationEventArgs e)
{
    await InitializeCameraAsync();
}

Summary

So that’s it – just a few lines of code to display what the camera views on your device. In summary:

  1. Create an app and set the capabilities to microphone and webcam;
  2. Add a CaptureElement to the app’s Xaml;
  3. Add the code to initialise and start previewing the camera’s view through the CaptureElement.

Remember that this isn’t code that’s good enough to be used in a production app – there’s no error handling or resource clean-up yet, and it doesn’t really do anything (like focus or record a picture).

The code I used to complete this part of the series is shown below:

public sealed partial class MainPage : Page
{
    // Provides functionality to capture the output from the camera
    private MediaCapture _mediaCapture;
 
    public MainPage()
    {
        InitializeComponent();
 
        Application.Current.Resuming += Application_Resuming;
    }
 
    private async void Application_Resuming(object sender, object o)
    {
        await InitializeCameraAsync();
    }
 
    protected override async void OnNavigatedTo(NavigationEventArgs e)
    {
        await InitializeCameraAsync();
    }

    private async Task InitializeCameraAsync()
    {
        if (_mediaCapture == null)
        {
            // Get the camera devices
            var cameraDevices = await DeviceInformation.FindAllAsync(DeviceClass.VideoCapture);
 
            // try to get the back facing device for a phone
            var backFacingDevice = cameraDevices
                .FirstOrDefault(c => c.EnclosureLocation?.Panel == Windows.Devices.Enumeration.Panel.Back);
 
            // but if that doesn't exist, take the first camera device available
            var preferredDevice = backFacingDevice ?? cameraDevices.FirstOrDefault();
 
            // Create MediaCapture
            _mediaCapture = new MediaCapture();
 
            // Initialize MediaCapture and settings
            await _mediaCapture.InitializeAsync(
                new MediaCaptureInitializationSettings {
                    VideoDeviceId = preferredDevice.Id
                });
 
            // Set the preview source for the CaptureElement
            PreviewControl.Source = _mediaCapture;
 
            // Start viewing through the CaptureElement 
            await _mediaCapture.StartPreviewAsync();
        }
    }
}

Next time in this series, I’ll look at how to test if the camera is capable of focussing, and if so, how to make it focus.

C# tip, Computer Vision, OCR, Optical Character Recognition

Optical Character Recognition in C# – Part #3, using Microsoft Cognitive Services (formerly Project Oxford)

This is the third part of my series on Optical Character Recognition (OCR), and what options are available for .NET applications – particularly low cost options. The first part was about using the open source package Tesseract, and the second part was about using the Windows.Media.Ocr libraries available to applications on the Universal Windows Platform (UWP). This part is about using Microsoft’s Project Oxford – this has a component which could be described as ‘OCR as a Service’.

Since I started this series, Build 2016 has happened and a few things have changed. Project Oxford has been rebranded as part of a wide suite of API services, known as Microsoft Cognitive Services. These APIs offer functions including:

  • Computer Vision
  • Speech;
  • Language;
  • Knowledge;
  • Search (better known as Bing services);

Microsoft have open sourced their client SDKs on Github here – this still carries some of the Project Oxford branding.

Getting started with OCR and Cognitive Services

In order to use OCR as a Service, you’ll need to get a subscription key from Microsoft. It’s pretty easy to get this, and you can sign up at this address, previewed below.

subscribe in seconds

I chose to sign up for the computer vision services (and also for Speech and Speaker Recognition previews). This allows me up to 5,000 transactions per month free of charge.

I’m able to view my subscriptions here, which shows me a screen like the one below.

subscription_dashboard

Let’s look at some code next.

Accessing OCR services using C#

In the previous two posts, I’ve been using a screenshot of one of my other blog posts – I want to keep using the same screenshot (shown below) in each of the three methods to be consistent.

sample_for_reading

As a reminder, Tesseract performed reasonably well, but wasn’t able to interpret the light grey text at the top of the page. The Windows.Media.Ocr library performed very well – it detected the grey text (although didn’t translate it very well), but the rest of the text was detected and interpreted perfectly.

I created a new C# console project to test Project Oxford. The next step was to get the necessary client packages from Nuget.

Install-Package Microsoft.ProjectOxford.Vision

Next, I ran the code below – this is a very simple test application. I’ve created an ImageToTextInterpreter class which basically wraps the asynchronous call to Microsoft’s servers. The text results come back as an “OcrResults” object, and I’ve written a simple static function to output the textual contents of this object to the console.

Remember to enter your own Subscription Key and image file path if you try the code below.

namespace CognitiveServicesConsoleApplication
{
    using Microsoft.ProjectOxford.Vision;
    using Microsoft.ProjectOxford.Vision.Contract;
    using System;
    using System.IO;
    using System.Linq;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main(string[] args)
        {
            Task.Run(async () =>
            {
                var cognitiveService = new ImageToTextInterpreter {
                    ImageFilePath = @"C:\Users\jeremy\Desktop\sample.png",
                    SubscriptionKey = "<<--[put your secret key here]-->>"
                };
 
                var results = await cognitiveService.ConvertImageToStreamAndExtractText();
 
                OutputToConsole(results);
             }).Wait();
        }
 
        private static void OutputToConsole(OcrResults results)
        {
            Console.WriteLine("Interpreted text:");
            Console.ForegroundColor = ConsoleColor.Yellow;
 
            foreach (var region in results.Regions)
            {
                foreach (var line in region.Lines)
                {
                    Console.WriteLine(string.Join(" ", line.Words.Select(w => w.Text)));
                }
            }
 
            Console.ForegroundColor = ConsoleColor.White;
            Console.WriteLine("Done.");
            Console.ReadLine();
        }
    }
 
    public class ImageToTextInterpreter
    {
        public string ImageFilePath { getset; }
 
        public string SubscriptionKey { getset; }
 
        const string UNKNOWN_LANGUAGE = "unk";
        
        public async Task<OcrResults> ConvertImageToStreamAndExtractText()
        {
            var visionServiceClient = new VisionServiceClient(SubscriptionKey);
 
            using (Stream imageFileStream = File.OpenRead(ImageFilePath))
            {
                return await visionServiceClient.RecognizeTextAsync(imageFileStream, UNKNOWN_LANGUAGE);
            }
        }
    }
}

I’ve pasted the results outputted to the console below – predictably, the result quality is almost identical to the results from the Windows.Media.Ocr test in Part #2 of the series (as the online service probably uses the same algorithms as the UWP libraries). The light grey text at the top of the image is interpreted badly, but the rest of the text has been interpreted perfectly.

translated_text
Conclusion

I’ve tried three methods of OCR using .NET technology – Tesseract, Windows.Media.Ocr for UWP, and online Cognitive Services. Each of these have different advantages and disadvantages.

Tesseract interprets text reasonably well. Its big advantage is that this is a free and open source solution, which can be integrated into regular C# applications without any need to be online. However, there’s some complexity around setting up English language files.

Windows.Media.Ocr interpreted black text very well (although lower contrast text wasn’t interpreted quite as well). This can be used offline also. However, this can only be used with Windows Store Apps, which might not be suitable for every application.

Cognitive Services (Project Oxford) also interpreted text very well, and as it’s a regular web service, it can be used in any C# application (so both classic C# apps and UWP apps). However, these services require the application to be online to function. This is a commercial application which limits free use to 5,000 transactions per month, and over this limit a purchase plan will apply.

.net, Computer Vision, Making

How to read and create barcode images using C# and ZXing.NET

I’ve written a few posts recently on computer vision and optical character recognition. This time, I thought I’d write about a more traditional way of allowing computers to read printed information – barcode scanning.

I’ve run across a few instances in my career where applications have a need for this – for example, scanning stock inventory in and out of a warehouse. The traditional way of doing this would be to use a hardware barcode scanner connected to a computer.  These are basically the same technology as you’d see at your local supermarket – the scanner is pointed at the item’s barcode (usually a 1-D barcode), and when a valid barcode is detected, the textual representation of the code is piped to the computer’s cursor (often finishing with a newline character).

WP_20160402_20_05_41_Pro_LI

In the barcode scanner shown above, I didn’t need to install any software to my Windows 10 computer – not even a driver, or an SDK. Getting this to work was easy – open notepad, point the scanner at the barcode, squeeze the scanner’s trigger and the numeric representation of the barcode appears in notepad, with a newline character at the end.

What about reading and writing barcodes in C#?

A barcode scanner might not be always be suitable for our applications – you may already have a digital image, and want to know what this barcode represents in English text. Also, this scanner only reads 1-D barcodes, which hold a small amount of data. 2-D barcodes (sometimes known as QR codes) are now common, which can hold a lot more data.

There’s several .NET solutions available to allow us to read barcodes from an image- the one I’m going to look at today is ZXing.NET. This is a .NET port of a Java project, and it’s available on Nuget under the Apache 2 licence at a beta status.

Let’s look at some examples and code.

Reading Barcodes with ZXing.NET in C#

First thing is to import the ZXing.NET nuget package into your project.

Install-Package ZXing.Net 

Next, let’s get a barcode – I’ve uploaded a PNG of the QR barcode that I want to decode.

qrimage

We can use the code below to read from this image from my desktop:

static void Main(string[] args)
{
    // create a barcode reader instance
    var barcodeReader = new BarcodeReader();
 
    // create an in memory bitmap
    var barcodeBitmap = (Bitmap)Bitmap.FromFile(@"C:\Users\jeremy\Desktop\qrimage.bmp");
 
    // decode the barcode from the in memory bitmap
    var barcodeResult = barcodeReader.Decode(barcodeBitmap);
 
    // output results to console
    Console.WriteLine($"Decoded barcode text: {barcodeResult?.Text}");
    Console.WriteLine($"Barcode format: {barcodeResult?.BarcodeFormat}");
}

The output on the console shows that this barcode contains a link to my twitter feed, and correctly identifies the format as a QR code:

Decoded barcode text: https://twitter.com/jeremylindsayni
Barcode format: QR_CODE

There’s more about the different barcode formats here.

The code above isn’t an example of best practice – it’s simply just to show how to read a barcode.

Writing Barcodes with ZXing.NET in C#

Let’s suppose we want to programmatically generate a barcode in C# – it’s pretty easy to do this as well.

Say we want to generate a QR code of a link to my blog –

static void Main(string[] args)
{
    // instantiate a writer object
    var barcodeWriter = new BarcodeWriter();
 
    // set the barcode format
    barcodeWriter.Format = BarcodeFormat.QR_CODE;
 
    // write text and generate a 2-D barcode as a bitmap
    barcodeWriter
        .Write("https://jeremylindsayni.wordpress.com/")
        .Save(@"C:\Users\jeremy\Desktop\generated.bmp");
}

The output shown below:

generated

Conclusion

I hope this is helpful to anyone trying to read or generate barcodes – the code is pretty simple. As I mentioned above, ZXing.NET is licenced under the Apache 2 licence and is open sourced at Codeplex. One more thing that is worth mentioning is that at the time of writing, it’s still in beta and the present nuget package – version 0.14.0.1 – hasn’t been updated since April 2014.

.net, Computer Vision, Fingerprint Enrollment

How to use C# to create a bitmap of a fingerprint from the DigitalPersona U.are.U 4000 fingerprint scanner, Part #1

In a previous post, I used the BioMini fingerprint scanner to generate a bitmap image of a fingerprint. I used the Neurotechnology Free Fingerprint Verification SDK with the BioMini hardware.

As part of the process, I created an interface which allowed me to enroll a fingerprint, and create the image, which defines a good surface for all I want to do at the moment. I designed this interface based on the very small amount of knowledge I have of fingerprint scanners and SDKs – so I was still interested to see if this interface would be useful (or even workable) for another scanner and SDK.

To test this, I started looking for other scanners and SDKs – and one candidate which looked very suitable was the digitalPersona U.are.U 4000B sensor. This has a .NET SDK available, but make sure that when you are buying the scanner device that you get the SDK as well – it’s possible to purchase these separately.

WP_20160324_18_02_27_Pro_LI

This SDK comes with a couple of sample Windows applications – but I’ve a personal preference to try to get things to work in a console application, just because it allows me to focus more on the code to get the scanner working (and less on the code to get the Windows app working). So I decided to write a Console application for the U.are.U 4000B scanner.

There are a few simple steps:

  1. Add references to the libraries DPFPDevNET.dll and DPFPShrNET.dll, both of which come with the SDK;
  2. Instantiate a DPFP.Capture.Capture object;
  3. Associate an event handler class with this Capture object, which has handlers for the events:
    • OnComplete;
    • OnFingerGone;
    • OnFingerTouch;
    • OnReaderConnect;
    • OnReaderDisconnect;
    • OnSampleQuality.
  4. Begin capturing a fingerprint from the scanner by calling the StartCapture method from the Capture object.
  5. After placing your finger on the reader, the event OnFingerTouch will be fired.
  6. After the scan has successfully complete, the OnComplete event is fired.
    • A parameter of the OnComplete handler contains information about the scanned fingerprint.
  7. Stop capturing a fingerprint from the scanner by calling the StopCapture method from the Capture object.

This seemed pretty straightforward – I wrote the class below.

public class FingerPrintScanner : DPFP.Capture.EventHandler
{
    public Capture capture { getset; } = new Capture();
    
    public void EnrollAndSavePicture()
    {
        capture.EventHandler = this;
        capture.StartCapture();
    }
    
    public void OnComplete(object capture, string readerSerialNumber, Sample sample)
    {
        ((Capture)capture).StopCapture();
 
        var sampleConvertor = new SampleConversion();
        Bitmap bitmap = null;
        sampleConvertor.ConvertToPicture(sample, ref bitmap);
 
        bitmap.Save(@"C:\Users\jeremy\Desktop\fingerprint.bmp");
    }
 
    public void OnFingerGone(object capture, string readerSerialNumber) { }
    public void OnFingerTouch(object capture, string readerSerialNumber) { }
    public void OnReaderConnect(object capture, string readerSerialNumber) { }
    public void OnReaderDisconnect(object capture, string readerSerialNumber) { }
    public void OnSampleQuality(object capture, string readerSerialNumber, CaptureFeedback captureFeedback) { }
}

And this allowed me to write the following simple program.

class Program
{
    static void Main(string[] args)
    {
        var scanner = new FingerPrintScanner();
        scanner.EnrollAndSavePicture();
    }
}

So this is a good start – I was able to capture a fingerprint and save it to my desktop. However, this implementation doesn’t use the interface I designed last time, which has separate methods for Enroll and CreateBitmapFile. I refactored the code slightly to implement this interface.

public class DigitalPersonaFingerPrintScanner : DPFP.Capture.EventHandler, IFingerprintScanner
{
    private Capture _capture;
    private Sample _sample;
 
    public void Enroll()
    {
        _capture = new Capture();
        _capture.EventHandler = this;
        _capture.StartCapture();
    }
 
    public void CreateBitmapFile(string pathToSaveBitmapTo)
    {
        if (_sample == null)
        {
            throw new NullReferenceException(nameof(_sample));
        }
 
        var sampleConvertor = new SampleConversion();
        Bitmap bitmap = null;
        sampleConvertor.ConvertToPicture(_sample, ref bitmap);
 
        bitmap.Save(pathToSaveBitmapTo);
    }
 
    public void Dispose()
    {
        _capture?.StopCapture();
        _capture?.Dispose();
    }
 
    public void OnComplete(object capture, string readerSerialNumber, Sample sample)
    {
        _capture.StopCapture();
        this._sample = sample;
    }
 
    public void OnFingerGone(object capture, string readerSerialNumber) { }
    public void OnFingerTouch(object capture, string readerSerialNumber) { }
    public void OnReaderConnect(object capture, string readerSerialNumber) { }
    public void OnReaderDisconnect(object capture, string readerSerialNumber) { }
    public void OnSampleQuality(object capture, string readerSerialNumber, CaptureFeedback captureFeedback) { }
}

This compiled, and I expected to be able to run the code below.

using (var scanner = new DigitalPersonaFingerPrintScanner())
{
    scanner.Enroll();
    scanner.CreateBitmapFile(@"C:\Users\jeremy\Desktop\fingerprint.bmp");
}

Unfortunately there was a problem – when designing the implementation, I hadn’t taken account of the fact that the device and SDK is driven by events – so after I start running the program, it’ll happily wait for someone to put their finger on the device screen and won’t block the main thread. So control flows straight on after the call to Enroll to the method which tries to create an image. However, because the fingerprint sample might hadn’t been successfully scanned at that point, I got a null reference exception.

nullreferenceexception

In the second part of this, I’ll describe how I fixed this problem, using the ManualResetEvent object.