After watching dotNetConf videos over the last couple of weeks, I’ve been really excited to try out some of the new image classification techniques in Visual Studio.
The dotNetConf keynote included a section from Bri Actman, who is a Program Manager on the .NET Team (the relevant section is on YouTube from 58m16 to 1hr06m35s). This section showed how developers can integrate various ML techniques and code into their projects using the ModelBuilder tool in Visual Studio – in her example, photographs of the outdoors were classified according to what kind of weather they showed.
As well as the keynote, there’s another relevant dotNetConf talk by Cesar de la Torre which is also available here on what’s new in ML.NET
And the way to integrate this into my project looks very straightforward – right click on the project -> Add Machine Learning -> and choose what type of scenario you want to use, as shown in the screenshot below. I’ve highlighted the feature that I’m really interested in – image classification.
I don’t want to re-iterate in detail what was shown in the presentation – please check out the video as it’ll be much clearer than me just writing a wall of text! – but the demonstration showed how to add images as training data: categorise those images into different folders, and name the folder how you’d like to classify those images.
- So for example, put all your images of cloudy skies into a folder called “Cloudy” – or put all your images of sunny skies into a folder called “Sunny”.
- The model builder will create a model of what it thinks is a sunny sky or a cloudy sky.
And when you point your code at a new image and ask it to predict what kind of weather is shown in that image, the code will compare that to the model constructed, and make a prediction.
This really appealed to me – powerful ML techniques presented in a way that’s easy to understand by developers who maybe haven’t done any work with deep neural networks. No need to try out (or even understand) Cost/Loss functions or Optimisers for your model, and even the model is just a zip file that you reference in your code like a library. This is tremendously powerful, especially to .NET developers like me who are trying become familiar with the technique and what’s possible.
How can I do this myself?
The only catch is that the image classification function isn’t actually generally available yet (sad face here). But it’s not far off – I’ve read the extension will be updated later this year. The image below shows what’s available at the time of writing this.
But I want to try it now, I don’t want to wait…
Fortunately there’s a way to try out image classification in ML.NET without the model builder in VS2019 – there’s a fully working example on GitHub here. This project classifies pictures of flowers, but it’s easy to pull the code and start using your own dataset.
Beyond Cats and Dogs
It’s almost canonical now to demonstrate image classification using pictures of cats and dogs and showing how tools can generate a model that distinguishes between them reliably. But I wanted to try push the tools a bit further. Instead of distinguishing between two different species like cats and dogs – which I can do myself – could I use machine learning (and specifically ML.NET) to distinguish between and identify different pedigrees of dogs (which is something that I don’t know how to do)?
First thing – finding a training data set
Fortunately I was able to find a really helpful dataset with over twenty thousand pictures of dogs from the Stanford site, and also super helpfully they’ve been categorised into different types using a folder structure – exactly the way that ML.NET image classification needs images to be arranged.
The dataset is about 750MB – just below the ML.NET’s present limit of 1GB – so this is a perfect size of dataset. There’s lots of other information like test and training features – but I chose not to use these additional features, I just wanted to throw the raw images at ML.NET with classifications stored in the folder names, and see how it performed.
Second – modifying the worked example
The example – “Deep Neural Network Transfer Learning” – is straightforward example code. It downloads the original dataset from a zip file, which is specified using a URL hardcoded into the DownloadImageSet method.
- I was able to modify the code to point to a folder of classified pictures of dog breeds which can be downloaded from here.
- Then I was able to drop my own images that I wanted to use the model on into a folder called “DogsForPrediction” (instead of FlowersForPrediction) and update the code to point to this folder.
Then I just ran the code.
As you’d expect, because there were over 20,000 images, it took a long while to work through the sample – 80% were used for training, and the remaining 20% used for testing the model, and the model was trained over 100 epochs. On my machine it took about a couple of hours to work through the training and testing process before it got to the part where it was predicting the type of dog from my own test images.
And it worked very well. I tried with about 10 photos that weren’t in the original dataset and it predicted each one correctly. I was surprised given the wide range of photographs (some photos were close ups, some were long range, some were pictures of the whole dog, some with just the head, some even with people in them too). But maybe I shouldn’t have been surprised – maybe that range was actually the reason why it worked so well with the photos I asked it to predict.
I even fed in a couple of photos that I thought would trick the model. One is of a rescue dog (a King Charles Spaniel – called Henry if you’re interested), and I didn’t think this breed was in the training data.
The model predicted the breed to be a “Blenheim Spaniel” (55% certainty). I had never heard of this breed, but it seems to be this is an alternative name for the King Charles Spaniel. So I guess the model is smarter than I am 🙂
Another photo I tried was of another rescue dog (this one isn’t a breed – again if you’re interested, he’s called Bingo). There’s no way that the model could correctly predict an answer, because this dog isn’t a recognised pedigree. We always suspected that he was half German Shepherd, but I was interested to see what my model tried to classify him as.
First attempt didn’t go so well – he was classified as a Tibetan Mastiff with about 97% certainty, and that’s definitely not right. But to be fair, it was a big picture with lots of non-relevant features, with only a small dog in the centre of it.
For my second attempt, I cropped the image…
…and this time (again, maybe not surprisingly) the top prediction was German Shepherd, with a certainty score of about 56%.
I’ve just started dipping my toes into the water and learning a bit more about what’s possible with ML.NET. And I’m impressed – the work done by the ML team meant there were far fewer barriers to entry than I expected. I was able to identify a reasonably complex challenge and start trying to model it with just a few changes to the open source sample code. This was really helpful to me and the way I like to learn – looking at sample code of real world complexity, and then start to understand in more detail how sections of that code works. I’m looking forward to using ML.NET to solve more problems.
Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [pdf] [poster] [BibTex]