C# tip, Computer Vision, OCR, Optical Character Recognition

Optical Character Recognition in C# – Part #3, using Microsoft Cognitive Services (formerly Project Oxford)

This is the third part of my series on Optical Character Recognition (OCR), and what options are available for .NET applications – particularly low cost options. The first part was about using the open source package Tesseract, and the second part was about using the Windows.Media.Ocr libraries available to applications on the Universal Windows Platform (UWP). This part is about using Microsoft’s Project Oxford – this has a component which could be described as ‘OCR as a Service’.

Since I started this series, Build 2016 has happened and a few things have changed. Project Oxford has been rebranded as part of a wide suite of API services, known as Microsoft Cognitive Services. These APIs offer functions including:

  • Computer Vision
  • Speech;
  • Language;
  • Knowledge;
  • Search (better known as Bing services);

Microsoft have open sourced their client SDKs on Github here – this still carries some of the Project Oxford branding.

Getting started with OCR and Cognitive Services

In order to use OCR as a Service, you’ll need to get a subscription key from Microsoft. It’s pretty easy to get this, and you can sign up at this address, previewed below.

subscribe in seconds

I chose to sign up for the computer vision services (and also for Speech and Speaker Recognition previews). This allows me up to 5,000 transactions per month free of charge.

I’m able to view my subscriptions here, which shows me a screen like the one below.

subscription_dashboard

Let’s look at some code next.

Accessing OCR services using C#

In the previous two posts, I’ve been using a screenshot of one of my other blog posts – I want to keep using the same screenshot (shown below) in each of the three methods to be consistent.

sample_for_reading

As a reminder, Tesseract performed reasonably well, but wasn’t able to interpret the light grey text at the top of the page. The Windows.Media.Ocr library performed very well – it detected the grey text (although didn’t translate it very well), but the rest of the text was detected and interpreted perfectly.

I created a new C# console project to test Project Oxford. The next step was to get the necessary client packages from Nuget.

Install-Package Microsoft.ProjectOxford.Vision

Next, I ran the code below – this is a very simple test application. I’ve created an ImageToTextInterpreter class which basically wraps the asynchronous call to Microsoft’s servers. The text results come back as an “OcrResults” object, and I’ve written a simple static function to output the textual contents of this object to the console.

Remember to enter your own Subscription Key and image file path if you try the code below.

namespace CognitiveServicesConsoleApplication
{
    using Microsoft.ProjectOxford.Vision;
    using Microsoft.ProjectOxford.Vision.Contract;
    using System;
    using System.IO;
    using System.Linq;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main(string[] args)
        {
            Task.Run(async () =>
            {
                var cognitiveService = new ImageToTextInterpreter {
                    ImageFilePath = @"C:\Users\jeremy\Desktop\sample.png",
                    SubscriptionKey = "<<--[put your secret key here]-->>"
                };
 
                var results = await cognitiveService.ConvertImageToStreamAndExtractText();
 
                OutputToConsole(results);
             }).Wait();
        }
 
        private static void OutputToConsole(OcrResults results)
        {
            Console.WriteLine("Interpreted text:");
            Console.ForegroundColor = ConsoleColor.Yellow;
 
            foreach (var region in results.Regions)
            {
                foreach (var line in region.Lines)
                {
                    Console.WriteLine(string.Join(" ", line.Words.Select(w => w.Text)));
                }
            }
 
            Console.ForegroundColor = ConsoleColor.White;
            Console.WriteLine("Done.");
            Console.ReadLine();
        }
    }
 
    public class ImageToTextInterpreter
    {
        public string ImageFilePath { getset; }
 
        public string SubscriptionKey { getset; }
 
        const string UNKNOWN_LANGUAGE = "unk";
        
        public async Task<OcrResults> ConvertImageToStreamAndExtractText()
        {
            var visionServiceClient = new VisionServiceClient(SubscriptionKey);
 
            using (Stream imageFileStream = File.OpenRead(ImageFilePath))
            {
                return await visionServiceClient.RecognizeTextAsync(imageFileStream, UNKNOWN_LANGUAGE);
            }
        }
    }
}

I’ve pasted the results outputted to the console below – predictably, the result quality is almost identical to the results from the Windows.Media.Ocr test in Part #2 of the series (as the online service probably uses the same algorithms as the UWP libraries). The light grey text at the top of the image is interpreted badly, but the rest of the text has been interpreted perfectly.

translated_text
Conclusion

I’ve tried three methods of OCR using .NET technology – Tesseract, Windows.Media.Ocr for UWP, and online Cognitive Services. Each of these have different advantages and disadvantages.

Tesseract interprets text reasonably well. Its big advantage is that this is a free and open source solution, which can be integrated into regular C# applications without any need to be online. However, there’s some complexity around setting up English language files.

Windows.Media.Ocr interpreted black text very well (although lower contrast text wasn’t interpreted quite as well). This can be used offline also. However, this can only be used with Windows Store Apps, which might not be suitable for every application.

Cognitive Services (Project Oxford) also interpreted text very well, and as it’s a regular web service, it can be used in any C# application (so both classic C# apps and UWP apps). However, these services require the application to be online to function. This is a commercial application which limits free use to 5,000 transactions per month, and over this limit a purchase plan will apply.

.net, OCR, Optical Character Recognition

Optical Character Recognition in C# in Universal Windows Applications – Part #2, using Windows.Media.Ocr

This is the second part in my series on Optical Character Recognition using C#. Last time I looked at the Apache 2 licenced package Tesseract, where I tested its recognition ability against a sample image, and wrote some sample code showing how to use it.

This time I want to test the abilities of the Windows.Media.Ocr library. This one is a bit different from a normal C# library, as this is only usable in Windows store applications, or Universal Windows Platform (UWP) applications.

I’m not going to present code samples in this post – most of the code would be about how to create a UWP application, with probably only a couple of lines dedicated to the actual OCR library. There’s an excellent blog post by Jelena Mojasevic here, which gives some sample code.

Getting Started with testing a Windows.Media.Ocr app in Visual Studio 2015

Microsoft provide a huge amount of starter information and samples for UWP – these are freely available from its Github page. It’s pretty easy to test these applications – I needed a Windows Phone so I could deploy the sample applications, but that’s because I’m developing on a machine that is a bit old and doesn’t support Hyper-V. The image below shows the error I get when my Windows Phone device isn’t attached.

install windows phone tools

You can get this code using your favourite tool (e.g. TortoiseGit) or download the zip, and extract this.  The code I found useful for this was in the OCR sample directory. This solution might compile and run on your machine first time, but if it doesn’t there’s two things that it might be useful to check:

1. Make sure the UWP tools are installed.

I didn’t include all the UWP tools when I was installing VS2015  – but if I hadn’t remembered this, it’s pretty easy to check if they are installed. Select File -> New Project -> Visual C# -> Windows -> Universal. Since they weren’t installed on my machine, I saw a screen like the one below which invites me to install the Universal Windows Tools:

install windows tools

I just selected this option, and my Visual Studio installer opened and guided me through the process of downloading and installing the necessary components. This took a long time so prepare to be patient!

2. Developer mode is required for running debugging Windows Store apps

This is pretty easy to solve – if your machine isn’t set up for debugging apps, you’ll see a message like the one below:

install windows tools 3

Just follow the instructions – go to “Settings”, “Update & Security”, and “For developers”, and choose to put your computer into Developer mode (Note – do this at your own risk, this is obviously something you should only do if you’re comfortable with it!)

install windows tools 6

If you change to Developer mode, you’ll get a warning like this anyway:

install windows tools 5

Testing how application recognises text from our sample image

I used the same image as previously, and copied it to my Windows phone. I was then able to run the OCR application through Visual Studio, which made it open on my Windows phone. Using the app, I browsed to the location I saved the file to, and triggered the app’s text recognition function. The picture below shows how the app interpreted the text from the source image:

wp_ss_20160317_0003

My review comments are:

  1. The text at the top seems to be close to gibberish – but remember this is the light grey text, which Tesseract didn’t even recognise in the last post.
  2. The rest of the text has been interpreted perfectly.

Conclusion

Windows.Media.Ocr tried to interpret the faint grey text, and didn’t fare well. However, for darker text, it gave extremely impressive results – it recognised the darker text perfectly.

So on the face of it, this is a very good option for OCR applications to be written in C#. But this library is only directly accessible through UWP apps – I’d prefer be able to use it in my regular Windows applications as well. For example, I may want to allow users to upload an image to a website and have the server recognise the text in the image.

Fortunately, Microsoft have us covered – they have created the “Project Oxford” web service for exactly this kind of purpose. I’ll return to this in the third post in this series, with a bit more C# code on how to get started using this service.

 

 

.net, OCR, Optical Character Recognition

Optical Character Recognition with C# in Classic Desktop Applications – Part #1, using Tesseract

Recently I’ve become interested in optical character recognition (OCR) – I’ve discussed this with some peers and their default reaction is that the software necessary to do this is very expensive. Certainly, there are commercial packages available to carry out this function, but I wanted to investigate if there were any lower cost options available which I could use in a .NET project.

After some investigation, I found three options:

  • Tesseract – a library with a .NET wrapper;
  • Windows.Media.Ocr – a library available for Windows Store Apps;
  • Project Oxford – OCR as a Service, a commercial product supplied by Microsoft which allows 5,000 transactions per month for free.

In this post, I’ll demonstrate how to use Tesseract – in two future posts, I’ll use the Windows.Media.Ocr library, and Project Oxford to carry out OCR.

Tesseract – an OCR library with a .NET wrapper

Tesseract is an OCR library available for various different operating systems, licenced under Apache 2. I’ll look at getting this working in C# under Windows.

In order to compare these three options, I needed a single baseline – an image with some text. I decided to take a screenshot of my previous blog post. sample_for_reading

This image seemed useful because:

  1. The font face isn’t particularly unusual, so should be a reasonable test for automated character recognition.
  2. There are a few different font sizes, so I’ll be interested to see how the software copes with this.
  3. There are different font colours – the introduction at the top of the page is in a light grey font, so should be quite challenging for the software to read.

As usual, I’m providing simple code which just gets text from an image – this isn’t meant to be an example of SOLID code, or best practices.

Tesseract is quite simple to set up and use – these instructions were heavily influenced by content from Charles Weld’s GitHub site. I’ve tried not to copy things verbatim – this is a description of what I needed to do to get things working.

1. First open Visual Studio and create a new C# Console application named “TesseractSampleApplication”.

2. Next, open the Package Manager Console and install the Tesseract nuget package using the command below:

Install-Package Tesseract 

This will add the necessary binary library to the project – Tesseract.dll. Also, there’ll be two folders added to the project, named “x86” and “x64”, containing other binaries.

3. You now need to add the English language files – these need to be in a project folder named “tessdata”. You can get these English language files from this location. The folder name can’t be changed or you’ll get an error.

4. As an optional step you can add configuration to the App.config file, which enables verbose logging. This helps a lot when things go wrong, and I got this code from this location.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6" />
    </startup>
  <system.diagnostics>
    <sources>
      <source name="Tesseract" switchValue="Verbose">
        <listeners>
          <clear />
          <add name="console" />
          <!-- Uncomment to log to file
                <add name="file" />
                -->
        </listeners>
      </source>
    </sources>
    <sharedListeners>
      <add name="console" type="System.Diagnostics.ConsoleTraceListener" />
 
      <!-- Uncomment to log to file
        <add name="file"
           type="System.Diagnostics.TextWriterTraceListener"
           initializeData="c:\log\tesseract.log" />
        -->
    </sharedListeners>
  </system.diagnostics>
</configuration>

5. Finally, the C# code – this very simple application just looks at the image I show above, and interprets text from it.

namespace TesseractSampleApplication
{
    using System;
    using Tesseract;
    
    class Program
    {
        static void Main(string[] args)
        {
            var ENGLISH_LANGUAGE = @"eng";
 
            var blogPostImage = @"C:\Users\jeremy\Desktop\sample_for_reading.png";
 
            using (var ocrEngine = new TesseractEngine(@".\tessdata", ENGLISH_LANGUAGE, EngineMode.Default))
            {
                using (var imageWithText = Pix.LoadFromFile(blogPostImage))
                {
                    using (var page = ocrEngine.Process(imageWithText))
                    {
                        var text = page.GetText();
                        Console.WriteLine(text);
                        Console.ReadLine();
                    }
                }
            }
        }
    }
}

Compile and run the above code – if you added the configuration code in step 4, you’ll see a large amount of logging text, and finally the text that Tesseract reads from the image.

I found that the text interpreted from the image was:

JEREMY LINDSAY

Building a 3d printer – Taz-5,
Part 8: Building the X-axis

Last time I attached the threaded rod and guide rails for the Zraxis. With these in
place, I’m now able to start building the Xraxis.

Afew notes on this post before I begin:

1.| ran outcfblackfilamentwhile buildingthis part,sol had to usethe
yellow filament l’ve been using for my other project.

2. This was one ofthe trickiest parts ofthe project so far. The Xraxis involves
a few pieces being bolted together, and I had issues with ABS parts
shrinking slightly , which meant that holes corresponding to each other
on different parts sometimes didn’t line up perfectly.

So a few comments are:

  1. Generally this was very good. There were a few small things that went wrong:
    • Z-axis” was interpreted as “Zraxis“, so the hypen wasn’t seen correctly.
    • I ran out of black filament while” was interpreted as “| ran outcfblackfilamentwhile” – the capital letter “I” was seen as a pipe character, and there were issues with spacing.
  2. The black text was recognised – however the light grey text beside my name, the brown category words, and the date of the blog post were not interpreted at all.

Conclusion

Tesseract is a good open source option for optical character recognition in C# applications. It’s simple to get started with Tesseract, and interpreted text well from the sample tested. However, there were some small issues around spacing and occasionally problems with character recognition.

Next time in this series, I’ll use the Windows.Media.Ocr library to interpret text from the same image.