C# tip, Computer Vision, OCR, Optical Character Recognition

Optical Character Recognition in C# – Part #3, using Microsoft Cognitive Services (formerly Project Oxford)

This is the third part of my series on Optical Character Recognition (OCR), and what options are available for .NET applications – particularly low cost options. The first part was about using the open source package Tesseract, and the second part was about using the Windows.Media.Ocr libraries available to applications on the Universal Windows Platform (UWP). This part is about using Microsoft’s Project Oxford – this has a component which could be described as ‘OCR as a Service’.

Since I started this series, Build 2016 has happened and a few things have changed. Project Oxford has been rebranded as part of a wide suite of API services, known as Microsoft Cognitive Services. These APIs offer functions including:

  • Computer Vision
  • Speech;
  • Language;
  • Knowledge;
  • Search (better known as Bing services);

Microsoft have open sourced their client SDKs on Github here – this still carries some of the Project Oxford branding.

Getting started with OCR and Cognitive Services

In order to use OCR as a Service, you’ll need to get a subscription key from Microsoft. It’s pretty easy to get this, and you can sign up at this address, previewed below.

subscribe in seconds

I chose to sign up for the computer vision services (and also for Speech and Speaker Recognition previews). This allows me up to 5,000 transactions per month free of charge.

I’m able to view my subscriptions here, which shows me a screen like the one below.

subscription_dashboard

Let’s look at some code next.

Accessing OCR services using C#

In the previous two posts, I’ve been using a screenshot of one of my other blog posts – I want to keep using the same screenshot (shown below) in each of the three methods to be consistent.

sample_for_reading

As a reminder, Tesseract performed reasonably well, but wasn’t able to interpret the light grey text at the top of the page. The Windows.Media.Ocr library performed very well – it detected the grey text (although didn’t translate it very well), but the rest of the text was detected and interpreted perfectly.

I created a new C# console project to test Project Oxford. The next step was to get the necessary client packages from Nuget.

Install-Package Microsoft.ProjectOxford.Vision

Next, I ran the code below – this is a very simple test application. I’ve created an ImageToTextInterpreter class which basically wraps the asynchronous call to Microsoft’s servers. The text results come back as an “OcrResults” object, and I’ve written a simple static function to output the textual contents of this object to the console.

Remember to enter your own Subscription Key and image file path if you try the code below.

namespace CognitiveServicesConsoleApplication
{
    using Microsoft.ProjectOxford.Vision;
    using Microsoft.ProjectOxford.Vision.Contract;
    using System;
    using System.IO;
    using System.Linq;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main(string[] args)
        {
            Task.Run(async () =>
            {
                var cognitiveService = new ImageToTextInterpreter {
                    ImageFilePath = @"C:\Users\jeremy\Desktop\sample.png",
                    SubscriptionKey = "<<--[put your secret key here]-->>"
                };
 
                var results = await cognitiveService.ConvertImageToStreamAndExtractText();
 
                OutputToConsole(results);
             }).Wait();
        }
 
        private static void OutputToConsole(OcrResults results)
        {
            Console.WriteLine("Interpreted text:");
            Console.ForegroundColor = ConsoleColor.Yellow;
 
            foreach (var region in results.Regions)
            {
                foreach (var line in region.Lines)
                {
                    Console.WriteLine(string.Join(" ", line.Words.Select(w => w.Text)));
                }
            }
 
            Console.ForegroundColor = ConsoleColor.White;
            Console.WriteLine("Done.");
            Console.ReadLine();
        }
    }
 
    public class ImageToTextInterpreter
    {
        public string ImageFilePath { getset; }
 
        public string SubscriptionKey { getset; }
 
        const string UNKNOWN_LANGUAGE = "unk";
        
        public async Task<OcrResults> ConvertImageToStreamAndExtractText()
        {
            var visionServiceClient = new VisionServiceClient(SubscriptionKey);
 
            using (Stream imageFileStream = File.OpenRead(ImageFilePath))
            {
                return await visionServiceClient.RecognizeTextAsync(imageFileStream, UNKNOWN_LANGUAGE);
            }
        }
    }
}

I’ve pasted the results outputted to the console below – predictably, the result quality is almost identical to the results from the Windows.Media.Ocr test in Part #2 of the series (as the online service probably uses the same algorithms as the UWP libraries). The light grey text at the top of the image is interpreted badly, but the rest of the text has been interpreted perfectly.

translated_text
Conclusion

I’ve tried three methods of OCR using .NET technology – Tesseract, Windows.Media.Ocr for UWP, and online Cognitive Services. Each of these have different advantages and disadvantages.

Tesseract interprets text reasonably well. Its big advantage is that this is a free and open source solution, which can be integrated into regular C# applications without any need to be online. However, there’s some complexity around setting up English language files.

Windows.Media.Ocr interpreted black text very well (although lower contrast text wasn’t interpreted quite as well). This can be used offline also. However, this can only be used with Windows Store Apps, which might not be suitable for every application.

Cognitive Services (Project Oxford) also interpreted text very well, and as it’s a regular web service, it can be used in any C# application (so both classic C# apps and UWP apps). However, these services require the application to be online to function. This is a commercial application which limits free use to 5,000 transactions per month, and over this limit a purchase plan will apply.