.net core, AI/ML, Visual Studio, Visual Studio Plugin

Trying image classification with ML.NET

After watching dotNetConf videos over the last couple of weeks, I’ve been really excited to try out some of the new image classification techniques in Visual Studio.

The dotNetConf keynote included a section from Bri Actman, who is a Program Manager on the .NET Team (the relevant section is on YouTube from 58m16 to 1hr06m35s). This section showed how developers can integrate various ML techniques and code into their projects using the ModelBuilder tool in Visual Studio – in her example, photographs of the outdoors were classified according to what kind of weather they showed.

As well as the keynote, there’s another relevant dotNetConf talk by Cesar de la Torre which is also available here on what’s new in ML.NET

And the way to integrate this into my project looks very straightforward – right click on the project -> Add Machine Learning -> and choose what type of scenario you want to use, as shown in the screenshot below. I’ve highlighted the feature that I’m really interested in – image classification.

ml image classification

I don’t want to re-iterate in detail what was shown in the presentation – please check out the video as it’ll be much clearer than me just writing a wall of text! – but the demonstration showed how to add images as training data: categorise those images into different folders, and name the folder how you’d like to classify those images.

  • So for example, put all your images of cloudy skies into a folder called “Cloudy” – or put all your images of sunny skies into a folder called “Sunny”.
  • The model builder will create a model of what it thinks is a sunny sky or a cloudy sky.

And when you point your code at a new image and ask it to predict what kind of weather is shown in that image, the code will compare that to the model constructed, and make a prediction.

This really appealed to me – powerful ML techniques presented in a way that’s easy to understand by developers who maybe haven’t done any work with deep neural networks. No need to try out (or even understand) Cost/Loss functions or Optimisers for your model, and even the model is just a zip file that you reference in your code like a library. This is tremendously powerful, especially to .NET developers like me who are trying become familiar with the technique and what’s possible.

How can I do this myself?

The ModelBuilder extension for Visual Studio is available for download here and you can read more about it here.

The only catch is that the image classification function isn’t actually generally available yet (sad face here). But it’s not far off – I’ve read the extension will be updated later this year. The image below shows what’s available at the time of writing this.

ml model in reality

But I want to try it now, I don’t want to wait…

Fortunately there’s a way to try out image classification in ML.NET without the model builder in VS2019 – there’s a fully working example on GitHub here. This project classifies pictures of flowers, but it’s easy to pull the code and start using your own dataset.

Beyond Cats and Dogs

It’s almost canonical now to demonstrate image classification using pictures of cats and dogs and showing how tools can generate a model that distinguishes between them reliably. But I wanted to try push the tools a bit further. Instead of distinguishing between two different species like cats and dogs – which I can do myself – could I use machine learning (and specifically ML.NET) to distinguish between and identify different pedigrees of dogs (which is something that I don’t know how to do)?

First thing – finding a training data set

Fortunately I was able to find a really helpful dataset with over twenty thousand pictures of dogs from the Stanford site, and also super helpfully they’ve been categorised into different types using a folder structure – exactly the way that ML.NET image classification needs images to be arranged.

The dataset is about 750MB – just below the ML.NET’s present limit of 1GB – so this is a perfect size of dataset. There’s lots of other information like test and training features – but I chose not to use these additional features, I just wanted to throw the raw images at ML.NET with classifications stored in the folder names, and see how it performed.

Second – modifying the worked example

The example – “Deep Neural Network Transfer Learning” – is straightforward example code. It downloads the original dataset from a zip file, which is specified using a URL hardcoded into the DownloadImageSet method.

  • I was able to modify the code to point to a folder of classified pictures of dog breeds which can be downloaded from here.
  • Then I was able to drop my own images that I wanted to use the model on into a folder called “DogsForPrediction” (instead of FlowersForPrediction) and update the code to point to this folder.

Then I just ran the code.

Results

As you’d expect, because there were over 20,000 images, it took a long while to work through the sample – 80% were used for training, and the remaining 20% used for testing the model, and the model was trained over 100 epochs. On my machine it took about a couple of hours to work through the training and testing process before it got to the part where it was predicting the type of dog from my own test images.

And it worked very well. I tried with about 10 photos that weren’t in the original dataset and it predicted each one correctly. I was surprised given the wide range of photographs (some photos were close ups, some were long range, some were pictures of the whole dog, some with just the head, some even with people in them too). But maybe I shouldn’t have been surprised – maybe that range was actually the reason why it worked so well with the photos I asked it to predict.

I even fed in a couple of photos that I thought would trick the model. One is of a rescue dog (a King Charles Spaniel – called Henry if you’re interested), and I didn’t think this breed was in the training data.

henry2

The model predicted the breed to be a “Blenheim Spaniel” (55% certainty).¬† I had never heard of this breed, but it seems to be this is an alternative name for the King Charles Spaniel. So I guess the model is smarter than I am ūüôā

Another photo I tried was of another rescue dog (this one isn’t a breed – again if you’re interested, he’s called Bingo). There’s no way that the model could correctly predict an answer, because this dog isn’t a recognised pedigree. We always suspected that he was half German Shepherd, but I was interested to see what my model tried to classify him as.

large bingo

First attempt didn’t go so well – he was classified as a Tibetan Mastiff with about 97% certainty, and that’s definitely not right. But to be fair, it was a big picture with lots of non-relevant features, with only a small dog in the centre of it.

For my second attempt, I cropped the image…

bingo

…and this time (again, maybe not surprisingly) the top prediction was German Shepherd, with a certainty score of about 56%.

bingo results

Wrapping up

I’ve just started dipping my toes into the water and learning a bit more about what’s possible with ML.NET. And I’m impressed – the work done by the ML team meant there were far fewer barriers to entry than I expected. I was able to identify a reasonably complex challenge and start trying to model it with just a few changes to the open source sample code. This was really helpful to me and the way I like to learn – looking at sample code of real world complexity, and then start to understand in more detail how sections of that code works. I’m looking forward to using ML.NET to solve more problems.

Dataset References:

Primary:
Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.  [pdf]  [poster]  [BibTex]

Secondary:
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.  [pdf]  [BibTex]

.net core, Azure, Azure Application Insights, C# tip, Instrumentation, MVC

Healthcheck endpoints in C# in MVC projects using ASP.NET Core, and writing results to Azure Application Insights

Every developer wants to build a system that never breaks, but in reality things go wrong. The best systems are built to expect that and handle problems, that rather than just silently failing.

Maybe your database becomes unavailable (e.g. runs out of hard disk space) and your failover doesn’t work – or maybe a third party web service that you depend on stops working.

Sometimes your application can be programmed to recover from things going wrong – here’s my post on The Polly Project to find out more about one way of doing that – but when there’s a catastrophic failure that you can’t recover from, you want to be alerted as soon as it happens, rather than hear from a customer.

And it’s kind to provide a way for your customers to find out about the health of your system. As an example, just check out the monitoring hub below from Postcodes.io – this is a great example of being transparent about key system metrics like service status, availability, performance, and latency.

postcode

MVC projects in ASP.NET Core have a built in feature to provide information on the health of your website. It’s really simple to add it to your site, and this instrumentation comes packaged as part of the default ASP.NET Core toolkit. There are also some neat extensions available on NuGet to format the data as JSON, add a nice dashboard for these healthchecks, and finally to push the outputs to Azure Application Insights. As I’ve been implementing this recently, I wanted to share with the community how I’ve done it.

Scott Hanselman has blogged about this previously, but there have been some updates since he wrote about this which I’ve included in my post.

Returning system health from an ASP.NET Core v2.2 website

Before I start – I’ve uploaded all the code to GitHub here so you can pull the project and try yourself. You’ll obviously need to update subscription keys, instrumentation keys and connection strings for databases etc.

Edit your MVC site’s Startup.cs file and add the line below to the ConfigureServices method:

services.AddHealthChecks();

And then add the line of code below to the Configure method.

app.UseHealthChecks("/healthcheck");

That’s it. Now your website has a URL available to tell whether it’s healthy or not. When I browse to my local test site at the URL below…

http://localhost:59658/healthcheck

..my site returns the word “Healthy”. (obviously your local test site’s URL will have a different port number, but you get the idea)

So this is useful, but it’s very basic. Can we amp this up a bit – let’s say want to see a JSON representation of this? Or what about our database status? Well fortunately, there’s a great series of libraries from Xabaril (available on GitHub here) which massively extend the core healthcheck functions.

Returning system health as JSON

First, install the AspNetCoreHealthChecks.UI NuGet package.

Install-Package AspNetCore.HealthChecks.UI

Now I can change the code in my StartUp.cs file’s Configure method to specify some more options.

The code below changes the response output to be JSON format, rather than just the single word “Healthy”.

app.UseHealthChecks("/healthcheck", new HealthCheckOptions
    {
        Predicate = _ => true,
        ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
    });

And as you can see in the image below, when I browse to the healthcheck endpoint I configured as “/healthcheck”, it’s now returning JSON:

healthcheck basic json

What about checking the health of other system components, like URIs, SQL Server or Redis?

Xabaril has got you covered here as well. For these three types of things, I just install the NuGet packages with the commands below:

Install-Package AspNetCore.HealthChecks.Uris
Install-Package AspNetCore.HealthChecks.Redis
Install-Package AspNetCore.HealthChecks.SqlServer

Check out the project’s ReadMe file for a full list of what’s available.

Then change the code in the ConfigureServices method in the project’s Startup.cs file.

services.AddHealthChecks()
        .AddSqlServer(connectionString: Configuration.GetConnectionString("SqlServerDatabase"),
                  healthQuery: "SELECT 1;",
                  name: "Sql Server", 
                  failureStatus: HealthStatus.Degraded)
        .AddRedis(redisConnectionString: Configuration.GetConnectionString("RedisCache"),
                        name: "Redis", 
                        failureStatus: HealthStatus.Degraded)
        .AddUrlGroup(new Uri("https://localhost:59658/Home/Index"),
                        name: "Base URL",
                        failureStatus: HealthStatus.Degraded);

Obviously in the example above, I have my connection strings stored in my appsettings.json file.

When I browse to the healthcheck endpoint now, I get much a richer JSON output.

health json

Can this information be displayed in a more friendly dashboard?

We don’t need to just show JSON or text output – Xabaril allows the creation of a clear and simple dashboard to display the health checks in a user friendly form. I updated my code in the StartUp.cs file – first of all, my ConfigureServices method now has the code below:

services.AddHealthChecks()
        .AddSqlServer(connectionString: Configuration.GetConnectionString("SqlServerDatabase"),
                  healthQuery: "SELECT 1;",
                  name: "Sql Server", 
                  failureStatus: HealthStatus.Degraded)
        .AddRedis(redisConnectionString: Configuration.GetConnectionString("RedisCache"),
                        name: "Redis", 
                        failureStatus: HealthStatus.Degraded)
        .AddUrlGroup(new Uri("https://localhost:59658/Home/Index"),
                        name: "Base URL",
                        failureStatus: HealthStatus.Degraded);
        
services.AddHealthChecksUI(setupSettings: setup =>
{
    setup.AddHealthCheckEndpoint("Basic healthcheck", "https://localhost:59658/healthcheck");
});

And my Configure method also has the code below.

app.UseHealthChecks("/healthcheck", new HealthCheckOptions
    {
        Predicate = _ => true,
        ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
    });
 
app.UseHealthChecksUI();

Now I can browse to a new endpoint which presents the dashboard below:

http://localhost:59658/healthchecks-ui#/healthchecks

health default ui
And if you don’t like the default CSS, you can configure it to use your own. Xabaril has an example of a css file to include here, and I altered my Configure method to the code below which uses this CSS file.

app.UseHealthChecks("/healthcheck", new HealthCheckOptions
    {
        Predicate = _ => true,
        ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
    })
    .UseHealthChecksUI(setup =>
    {
        setup.AddCustomStylesheet(@"wwwroot\css\dotnet.css");
    });
 
app.UseHealthChecksUI();

And now the website is styled slightly differently, as you can see in the image below.

health styled ui

What happens when a system component fails?

Let’s break something. I’ve turned off SQL Server, and a few seconds later the UI automatically refreshes to show the overall system health status has changed – as you can see, the SQL Server check has been changed to a status of “Degraded”.

health degrades

And this same error appears in the JSON message.

health degraded json

Can I monitor these endpoints in Azure Application Insights?

Sure – but first make sure your project is configured to use Application Insights.

If you’re not familiar with Application Insights and .NET Core applications, check out some more information here.

If it’s not set up already, you can add the Application Insights Telemetry by right clicking on your project in the Solution Explorer window of VS2019, selecting “Add” from the context menu, and choosing “Application Insights Telemetry…”. This will take you through the wizard to configure your site to use Application Insights.

aitel

Once that’s done, I changed the code in my Startup.cs file’s ConfigureServices method to explicitly push to Application Insights, as shown in the snippet below:

services.AddHealthChecks()
        .AddSqlServer(connectionString: Configuration.GetConnectionString("SqlServerDatabase"),
                  healthQuery: "SELECT 1;",
                  name: "Sql Server", 
                  failureStatus: HealthStatus.Degraded)
        .AddRedis(redisConnectionString: Configuration.GetConnectionString("RedisCache"),
                        name: "Redis", 
                        failureStatus: HealthStatus.Degraded)
        .AddUrlGroup(new Uri("https://localhost:44398/Home/Index"),
                        name: "Base URL",
                        failureStatus: HealthStatus.Degraded)
        .AddApplicationInsightsPublisher();
        
services.AddHealthChecksUI(setupSettings: setup =>
{
    setup.AddHealthCheckEndpoint("Basic healthcheck", "https://localhost:44398/healthcheck");
});

Now I’m able to view these results in the Application Insights – the way I did this was:

  • First browse to portal.azure.com and click on the “Application Insights” resource which has been created for your web application (it’ll probably be top of the recently created resources).
  • Once that Application Insights blade opens, click on the “Metrics” menu item (highlighted in the image below):

app insights metrics

When the chart windows opens – it’ll look like the image below – click on the “Metric Namespace” dropdown and select the “azure.applicationinsights” value (highlighted below).

app insights custom metric

Once you’ve selected the namespace to plot, choose the specific metric from that namespace. I find that the “AspNetCoreHealthCheckStatus” metric is most useful to me (as shown below).

app insights status

And finally I also choose to display the “Min” value of the status (as shown below), so if anything goes wrong the value plotted will be zero.

app insights aggregation

After this, you’ll have a graph displaying availaility information for your web application. As you can see in the graph below, it’s pretty clear when I turned on my SQL Server instance again so the application health went from a overall health status of ‘Degraded’ to ‘Healthy’.

application insights

Wrapping up

I’ve covered a lot of ground in this post – from .NET Core 2.2’s built in HealthCheck extensions, building on that to use community content to check other site resources like SQL Server and Redis, adding a helpful dashboard, and finally pushing results to Azure Application Insights. I’ve also created a bootstrapper project on GitHub to help anyone else interested in getting started with this – I hope it helps you.

 

.net core, Azure, C# tip, Clean Code, Dependency Injection, Inversion of Control, MVC

Azure Cache as a session data store in C# and MVC

Using the HTTP Session is one of those things that provokes…opinions. A lot of developers think that sessions are evil. And whereas I understand some of the reasons why that’s a common viewpoint – I’ve had my share of problems with code that uses sessions in the past – I’d have to qualify any criticism by saying the problems were more about how I was using the technique, rather than any inherent problem with sessions as a concept.

For example, some instances where using an in-memory session store can cause problems are:

  • If you chuck lots of website data into an in-memory session, you’ll quickly eat up lots of RAM on your web server. This might eventually cause performance problems.
  • Sessions are often short lived – often around 20 minutes – leading to a poor user experience after a period of inactivity (like being unexpectedly logged out).
  • Also in a load balanced environment, users might experience issues – if their first request leads to a session being created on one web server, and then their next request is routed to a different (less-busy) web server, then it won’t know anything about their previous session. Apparently you can work around this with using sticky sessions…and my own experiences with the sticky sessions approach are best described as “mixed”. But YMMV.
  • If you’re not using SSL/TLS, your session might be vulnerable to the Man-in-the-Middle attack vector. But the easy answer to this is….use SSL.

Anyway – I think most people would agree with the basic need for one web page to access data entered on another web page. However, if you have high throughput, a need for a large session store, or a load balanced environment, then the out-of-the-box HTTP Session object might not be for you. But that doesn’t mean you can’t use sessions at all.

‘Azure Cache for Redis’ to the rescue

So even though my application isn’t in a load-balanced environment right now, I’d still like to make sure it’s easy to port to one in the future. So I’ve been looking for alternatives to using the Session object:

  • I could use Cookies, but I can’t store very much information in them.
  • I could use a SQL database, but this seems heavyweight for my need for shortlived session-based information.
  • Something like a NoSQL store like Redis would suit very well – it’s super fast with low-latency, high-throughput performance.

I’ve written about using Redis as a fast-access data store a long time ago, but that post is out of date now and worth updating as there’s now a built-in Azure option – Azure Cache for Redis.

Spinning up Azure Cache for Redis

Check out the official docs for how to create a cache in Azure – it’s clearly described here with lots of screenshots to guide you through.

But how can I using Azure Cache for Redis in a website?

I don’t really like the ASP.NET implementation from the official documentation. It works, but there’s a lot of code in the controller’s action, and I’d like a cleaner solution. Ideally I’d like to inject an interface into my controller as a dependency, and use ASP.NET Core’s service container to instantiate the dependency on demand.

I found this really useful post from Simon Holman, and he also has created a super helpful example on GitHub. I tested this with an MVC project in .NET Core v2.2, and the implementation is very simple (check out Simon’s source code for exactly where to put these snippets).

  • Update the Startup.cs file’s ConfigureServices method after putting the connection string into your appsettings.json file:
services.AddDistributedRedisCache(options =>
{
    options.Configuration = Configuration.GetConnectionString("RedisCache");
});
 
services.AddSession();
  • Update the Startup.cs file’s Configure method:
app.UseSession();
  • Here’s how to set data into the cache:
var sessionstartTime = DateTime.Now.ToLongTimeString();
HttpContext.Session.SetString("mysessiondata", sessionstartTime);
  • …get data from the cache:
var sessionstartTime = HttpContext.Session.GetString("mysessiondata");
  • …and remove data from the cache:
HttpContext.Session.Remove("mysessiondata");

Some things to bear in mind about Azure Cache for Redis

I wanted to dive into the details of Azure Cache a bit more, just to understand what’s actually going on beneath the hood when we’re reading from, and writing to, the session object.

You can see what session keys are saved in your Redis cache using the Azure portal

redis console arrow

Once you’re in the console, you run the command “scan 0 count 100 match *” to see up to the first 100 keys in your Redis cache.

redis console

From the screenshot above, I can see that I’ve got 16 sessions open.

The Guid in the Key is actually your Session’s *private* “_sessionKey”

In the image above, you can see a bunch of GUID objects which are the keys of the items in my Redis cache. And if you look at the 16th item in the list above, you can see that it corresponds to the private “_sessionKey” value, which is held in my HttpContext.Session object (compare with the VS2019 watch window below).

redis session

So this information is interesting…but I’m not sure how useful it is. Since that property is private, you can’t access it (well you can, but not easily, you have to use reflection). But it might be helpful to know this at debug time.

Browsers behave differently when in incognito mode

I thought I’d try the application with my browser in incognito mode. And every time I hit refresh on the browser when I was in incognito or private browsing mode, a new session key was created on the server – which meant it wasn’t able to obtain data from the session previously created in the same browser instance.

You can see the number of keys has hugely increased in the image below, corresponding to the number of times I hit refresh:

private window redis

But at least I can detect when the session isn’t available through the HttpContext.Session.IsAvailable property – when a session is available, the image below is what I can see in the session using a watch in the VS2019 debugger:

session available

And when a session isn’t available (such as when my browser is in incognito mode), this is what I see:

session unavailable

So at least I can programmatically distinguish between when the session can work for the user and when it can’t.

In summary, this behaviour had a couple of implications for me:

  • Session persistence didn’t work in incognito/private windows – values weren’t persistent in the same session across pages.
  • Hitting refresh a bunch of times in incognito will create lots of orphan session objects in your server, which might have security/availability implications for your application, especially if your sessions are large and fill up available memory.

Clearing down sessions was harder than I thought

HttpContext.Session.Clear() emptied my session, but didn’t delete the key from the server, as I could still see it in the Redis console.

In fact, the only way I was able to remove sessions held in Redis was to get right into the guts of the StackExchange.Redis package using the code below. Since I knew the exact session that I wanted to delete had the key “57154387-d8b7-c361-a174-9d27b6c6caae“:

var connectionMultiplexer = StackExchange.Redis.ConnectionMultiplexer.Connect(Configuration.GetConnectionString("RedisCache"));
 
connectionMultiplexer.GetDatabase(connectionMultiplexer.GetDatabase().Database).KeyDelete("57154387-d8b7-c361-a174-9d27b6c6caae");

But this is only useful if you can get the exact session key that you want to delete, and that isn’t particularly easy. You could use reflection to get that private value like in the code below, but I get why that’s not something you might want to do.

var _sessionKey = typeof(DistributedSession)
                .GetField("_sessionKey", BindingFlags.NonPublic | BindingFlags.Instance)
                .GetValue(HttpContext.Session);

Wrapping up

I’ve talked a little bit about sessions in this post – they’re not a magic hammer, but also aren’t inherently a bad tool – maybe consider an alternative to in-memory sessions if you have large session objects, or are working in a load balanced environment. Azure Cache for Redis might be one of those alternatives. I’ve found it to be interesting and useful, and relatively easy to set up as an alternative to an in-memory session, but there are a few quirks – sessions may not work the way you expect them to for users who are incognito/using private browsing, and it’s hard to completely delete a session once it has been created.

.net core, Azure, Azure DevOps, Azure Pipelines

Everything as Code with Azure DevOps Pipelines: C#, ARM, and YAML: Part #4, Deploying an ARM template to create an Azure App Service with code

This series of posts is about how to use Azure DevOps and Pipelines to set up a basic C# web application, app service infrastructure, and CI/CD pipelines ‚Äď with everything saved as code in our source code repository.

Last time, I finished up with a coded multi-stage Pipeline, which created three resource groups using a parameterised YAML file. The diagram below is a high level description of the overall system architecture I’ve implemented so far.

resource-group-architecture-diagram

I inserted placeholder to the pipeline for tasks like creating an Azure app service – the diagram below shows all of these stages (including parallel ones) deployed using Azure pipelines.

stages

This time I’ll show how to create a simple Azure App Service using code, and integrate it into my deployment pipeline – by the end of this post I’ll have implemented the infrastructure shown below.

app picture

Create an App Service using Infrastructure as Code

I’m going to use ARM (Azure Resource Manager) templates to create my Azure App Service, which is where I’ll host my website. Using an ARM template allows me to specify all of my my host infrastructure details in code, which makes creating my infrastructure much easier, and more repeatable and stable.

I’ve chosen to use Visual Studio 2019 to create my ARM template – I know a lot of people prefer VSCode, but this is just the way that I’m used to creating templates.

Before starting with Visual Studio 2019, I need to make sure I’ve got the correct packs installed for Azure development – I’ve included a screenshot of the “Azure development” pack I needed to install in my “Visual Studio Updater” (highlighted below in red).

vs azure

Next, I just create a new project in Visual Studio 2019. I’ve selected the “Azure Resource Group” template type, as shown below, and then I click “Next”.

arg picture

The next screen allows me to configure the name of the project (I’ve just called my project SimpleAzureAppService), and where it’s saved. I’ve chosen to save it in the same folder as I’ve used in the previous parts of this series.

This screen offers me an option to select the .NET Framework version, but I don’t really understand the point of this as the ARM templates are JSON files.

arg configure

Next, and probably most importantly, Visual Studio 2019 gives me the option to select what kind of ARM template I want to use. There are a bunch of useful canned templates available, and to create an Azure App Service, I’ve selected the “Web app” template, as shown below.

select azure template

After I click OK on that screen, the ARM project is created. All the magic happens in the WebSite.json file, which is where all the hosting infrastructure is specified. I haven’t altered any parameters – I’ll do that in my Azure Pipeline directly.

I’m not going to talk about the innards of the WebSite.json file here – that’s a whole other series of posts.

solution explorer

At this point I pushed the code up to my public GitHub repository.

Now add a task to the Azure Pipeline

In my Azure DevOps project (which I’ve created in the previous posts, part #2 and part #3), I want to add a new stage in my pipeline which will use this ARM project to deploy my Azure App Service.

I can do this by adding an “Azure Resource Group” task. In the screen where I can edit my pipeline definition file (azure-pipelines.yml), I searched for “Azure Resource Group” in the Tasks window on the right hand side (as highlighted below).

task

Selecting this task will open a window on the right side of the screen like the one below, where you can enter parameters for the task.

arm task

I entered the values for this task that I’ve specified below:

  • Azure Subscription: “Visual Studio Professional with MSDN
    • This might be different for you – and I’m going to replace this with the $(subscription) parameter anyway as I’ve specified it at the top of my azure-pipelines.yml file.
  • Action: Create or update resource group
  • Resource Group: integration-rg
    • This is the resource group created in the previous stage.
  • Region: UK South
  • Template Location: URL of the file
  • Template link: https://raw.githubusercontent.com/jeremylindsayni/EverythingAsCode/master/SimpleAzureAppService/SimpleAzureAppService/WebSite.json
    • Obviously this is specific to my public GitHub repo, it will be different for you.
  • Override template parameters: -hostingPlanName integration-webfarm
    • This is a parameter which is left empty in the ARM file (WebSite.json) that’s generated by Visual Studio 2019. I don’t specify this in the WebSite.parameters.json file because I want to use different parameters in the different environments (i.e. integration, testing, demonstration)
  • Deployment mode: Incremental

When you add this to your YAML, it might complain about indentation or position – for completeness, here’s the YAML stage I’ve used:

- stage: deploy_app_service_to_integration
  displayName: 'Deploy the app service to the integration environment'
  dependsOn: build_integration
  jobs:
  - job: deploy_app_service_to_integration
    pool:
      vmImage: 'Ubuntu 16.04'
    steps:
    - task: AzureResourceGroupDeployment@2
      inputs:
        azureSubscription: '$(subscription)'
        action: 'Create Or Update Resource Group'
        resourceGroupName: 'integration-rg'
        location: 'UK South'
        templateLocation: 'URL of the file'
        csmFileLink: 'https://raw.githubusercontent.com/jeremylindsayni/EverythingAsCode/master/SimpleAzureAppService/SimpleAzureAppService/WebSite.json'
        overrideParameters: '-hostingPlanName integration-webfarm'
        deploymentMode: 'Incremental'

Now we can run the pipeline until it finishes, and when I look in my Azure portal, I can see that an App Service plan has been created with an App Service in my “integration-rg” resource group.

webfarm

This is great – using only code, I’m able to specify infrastructure to which I’ll be able to deploy my website. You can even see the default website running in this app service (which will show you a site like the image below) if you browse to the App Service site. For me, I just browse to a site which is the name of my App Service with”.azurewebsites.net” added to the end.

appservice website

But there’s also an Application Insights instance, which strangely has been created in the “East US” region. What’s happened here?

Fortunately, because everything is in code, I’m able to look at what the WebSite.json file has specified and work it out by doing a simple text search. Inside the JSON node that specifies the Application Insights part, the location “East US” is actually hard-coded inside the default template generated by Visual Studio 2019!

{
  "apiVersion": "2014-04-01",
  "name": "[variables('webSiteName')]",
  "type": "Microsoft.Insights/components",
  "location": "East US",
  "dependsOn": [
    "[resourceId('Microsoft.Web/sites/', variables('webSiteName'))]"
  ],
  "tags": {
    "[concat('hidden-link:', resourceGroup().id, '/providers/Microsoft.Web/sites/', variables('webSiteName'))]": "Resource",
    "displayName": "AppInsightsComponent"
  },
  "properties": {
    "applicationId": "[variables('webSiteName')]"
  }
}

I’m going to guess that when the template was written, Application Insights was not available in all regions and therefore the authors hardwired a region that they knew would work.

Anyway, I know that Application Insights is available in “UK South”, so I can modify the WebSite.json code to remove the “East US” hardwired reference, and change it to:

"location": "[resourceGroup().location]"

And now if I re-run the pipeline, everything is in the UK South region.

webfarm

You’ll also notice that the website is called “webSite” and then seemingly random numbers and letters – this is because it’s coded that way in the “WebSite.json” variables section, as shown below.

"variables": {
  "webSiteName": "[concat('webSite', uniqueString(resourceGroup().id))]"
},

Again because this is all in code, I can alter this – say I want my websites to be prefixed with “everythingAsCode-” instead of “webSite”, I can just change the variable to the code below:

"variables": {
  "webSiteName": "[concat('everythingAsCode-', uniqueString(resourceGroup().id))]"
}

And after re-running the pipeline, my website has a new name.

newwebfarm

Wrapping up

This post has shown how I can use an ARM template to specify hosting infrastructure for my website. So far, everything in the high-level architecture diagram below has been specified in code.

app picture

I’ve mentioned the advantages of this approach in the previous posts, but I’ll say it again – it blows my mind that I can arrive at a project, pull the source code, and then I can just click a button to build my own copy of the entire infrastructure and deployment pipeline using Azure DevOps. Next time I’ll look at how to deploy the default MVC website written in C# using this approach.

.net core, Azure, Azure DevOps, Azure Pipelines

Everything as Code with Azure DevOps Pipelines: C#, ARM, and YAML: Part #3, resource groups and YAML templates

This series of posts is about how to use Azure DevOps and Pipelines to set up a basic web application, app service infrastructure, and CI/CD pipelines – with everything saved as code in our source code repository.

reference-architecture-diagram-2

Previously I’ve written about how to configure multi-stage pipelines in YAML – I created a simple skeleton of different stages for creating infrastructure and deploying a website to integration, test and demonstration environments. However, the skeleton YAML had no actual build/deploy functionality.

This time I’ll cover a couple of different topics:

  • Creating resource groups for each environment in our overall solution (integration, testing and demonstration).
  • Reducing code duplication by parameterising YAML templates, so that the main pipeline can call smaller YAML templates. This simplifies the overall structure.

As always, my code is available on GitHub here.

A simple YAML template

In the previous post, I created a series of skeleton stages in my pipeline that simply wrote text to the standard output using the echo command. I’d like to start including code in my stages that now does something useful, and the first step in each stage is to create a resource group that can hold all the logically related pieces of infrastructure for each of the 3 environments (integration, test and demo).

It was pretty obvious that there was some similarity in the stages:

  • Build integration infrastructure and deploy the website to this
  • Build test infrastructure and deploy the website to this
  • Build demo infrastructure and deploy the website to this

This kind of repeating pattern is an obvious signal that the code could be parameterised and broken out into a re-usable block.

I’ve created a sample below of how I could do this – the block of code has 4 parameters:

  • name, for the name of the job
  • vmImage, for the type of vm I want my agent to be (e.g. windows, ubuntu etc)
  • displayName, to customise what I want the step to display in my pipeline,
  • resourceGroupName, obviously for the name of the resource group.

Because I just want to change one thing at a time, the code below still just echoes text to standard output.

parameters:
  name: '' 
  vmImage: ''
  displayName: ''
  resourceGroupName: ''

jobs:
- job: ${{ parameters.name }}
  pool: 
    vmImage: ${{ parameters.vmImage }}
  steps:
  - script: echo create resource group for ${{ parameters.resourceGroupName }}
    displayName: ${{ parameters.displayName }}

And I can use this in my main pipeline file – azure-pipelines.yml – as shown below. It’s pretty intuitive – the “jobs” section has a “template” which is just the path to where I’ve saved the template in my source code repository, and then I list out the parameters I want to use for this job.

- stage: build_integration
  displayName: 'Build the integration environment'
  dependsOn: build
  jobs:
  - template: ci/templates/create-resource-group.yml
    parameters:
      name: create_infrastructure
      displayName: 'First step in building the integration infrastructure'
      vmImage: 'Ubuntu-16.04'
      resourceGroupName: integration

Using Tasks in Azure Pipelines – the Azure CLI task

I’ve updated my template in the code below – instead of simply writing what it’s supposed to do, it now uses an Azure CLI task to create a resource group.

The Azure CLI makes this kind of thing quite simple – you just need to tell it the name of your resource group, and where you want to host it. So if I wanted to give my resource group the name”development-rg” and host it in the “UK South” location, the command would be:

az group create -n development-rg -l uksouth

Open the pipeline in Azure DevOps as shown below, and place the cursor where you want to add the task. The “tasks” sidebar is open by default on the right hand side.

task1

In the “tasks” sidebar there’s a search box. I want to add an Azure CLI task so that’s the text I search for – it automatically filters and I’m left with only one type of task.

task2

After clicking on the “Azure CLI” task, the sidebar refreshes to show me what information I need to supply for this task:

  • the Azure subscription to use,
  • what kind of script to run (inline or a file), and
  • the script details (as shown below, I just put the simple resource group creation script inline)

task3

Finally when I click the “Add” at the bottom right of the screen, the YAML corresponding to the task is inserted to my pipeline.

It might not be positioned quite correctly – sometimes I have to hit tab to move it to the correct position. But the Azure DevOps editor uses the red wavy line under any elements which it thinks are in the wrong place to help out.

task4

I find this technique quite useful when I want to generate the YAML for a task, as it makes it easier for me to modify and parameterise it. I’d find it quite hard to write YAML for DevOps tasks from scratch, so I’ll take whatever help I can get.

I’ve tweaked the generated code to parameterise it, and included it as a task in my template below (highlighted in red). I always want to deploy my code to the “UK South” location so I’ve left that as an unparameterised value in the template.

parameters:
  name: '' 
  vmImage: ''
  displayName: ''
  resourceGroupName: ''
  subscription: ''

jobs:
- job: ${{ parameters.name }}
  pool: 
    vmImage: ${{ parameters.vmImage }}
  steps:
  - task: AzureCLI@1
    displayName: ${{ parameters.displayName }}
    inputs:
      azureSubscription: ${{ parameters.subscription }}
      scriptLocation: 'inlineScript'
      inlineScript: az group create -n ${{ parameters.resourceGroupName }}-rg -l uksouth

And as shown earlier, I can call it from my stage by just referencing the template location, and passing in all the parameters that need to be populated for the job to run.

trigger:
- master

variables:
  subscription: 'Visual Studio Professional with MSDN'

stages:

# previous stages...

- stage: build_integration
  displayName: 'Build the integration environment'
  dependsOn: build
  jobs:
  - template: ci/templates/create-resource-group.yml
    parameters:
      name: create_infrastructure
      displayName: 'First step in building the integration infrastructure'
      vmImage: 'Ubuntu-16.04'
      resourceGroupName: integration
      subscription: $(subscription)

# next stages...

And now we have a pipeline that builds the architecture shown below. Still no app service or website, but I’ll write about those in future posts.

resource-group-architecture-diagram

Ironically, there’s now more lines of code in the solution than there would have been if I had just duplicated all the code! But that’s just because this is still pretty early skeleton code. As I build out the YAML to generate infrastructure and do more complex things, the value will become more apparent.

Service connections

Last note – if you take a fork of my repository and try to run this yourself, it’ll probably not run because my Azure subscription is called “Visual Studio Professional with MSDN” and your subscription is probably called something else. You could change the text of the variable in the azure-pipelines.yml file in your fork, or you could also set up a Service Connection with this name.

Setting one up is quite straightforward – hit “Project settings” which lives at the bottom left of your Azure DevOps project, which opens up a window with a list of settings. Select “Service connections” as shown below, and select the “Azure Resource Manager” option after clicking on “New service connection”.

serviceconnection1

This will open a dialog where you can specify the connection name (I chose to call my “Visual Studio Professional with MSDN” but you could call it whatever you want), and you can also select your subscription from the dropdown level. I didn’t need to specify a resource group, and just hit ok. You might be asked for your Azure login and password during this process.

serviceconnection2

Whatever you decided to call your service connection will be the text you can use in your pipeline to specify the subscription Azure should use.

Wrapping up

There’s been a lot of words and pictures here to achieve only a little progress, but it’s a big foundational piece – instead of having a huge YAML pipeline which could become unmaintainable, we can break it up into logical parameterised code blocks. We’ve used the Azure CLI task as an example of how to do this. Next time I’ll write deploying some ARM templates to create an Azure App Service in each of the three environments.

.net, .net core, Azure, Azure DevOps, Azure Pipelines, Web Development

Everything as Code with Azure DevOps Pipelines: C#, ARM, and YAML: Part #1

Like a lot of developers, I’ve been using Azure DevOps to manage my CI/CD pipelines. I think (or I hope anyway) most developers now are using a continuous integration process – commit to a code repository like GitHub, which has a hook into a tool (like DevOps Build Pipelines) which checks every change to make sure it compiles, and doesn’t break any tests.

Just a note – this post doesn’t have any code, it’s more of an introduction to how I’ve found Azure Pipelines can make powerful improvements to a development process. It also looks at one of the new preview features, multi-stage build pipelines. I’ll expand on individual parts of this with more code in later posts.

Azure Pipelines have traditionally been split into two types – build and release pipelines.

oldpipelines

I like to think of the split in this way:

Build Pipelines

I’ve used these for Continuous Integration (CI) activities, like compiling code, running automated tests, creating and publishing build artifacts. These pipelines can be saved as YAML¬†– a human readable serialisation language¬†– and pushed to a source code repository. This means that one developer can pull another developer’s CI code and run exactly the same pipeline without having to write any code. One limitation is that Build Pipelines are a simple list of tasks/jobs – historically it’s not been possible to split these instructions into separate stages which can depend on one or more conditions (though later in this post, I’ll write more about a preview feature which introduces this capability).

Release Pipelines

I’ve used these for¬†Continuous¬†Deployment (CD) activities, like creating environments and deploying artifacts created during the Build Pipeline process to these environments. Azure Release Pipelines can be broken up into stages with lots of customized input and output conditions, e.g. you could choose to not allow artifacts to be deployed to an environment until a manual approval process is completed. At the time of writing this, Release Pipelines cannot be saved as YAML.

What does this mean in practice?

Let’s make up a simple scenario.

A client has defined a website that they would like my team to develop, and would like to see a demonstration of progress to a small group of stakeholders at the end of each two-week iteration. They’ve also said that they care about good practice – site stability and security are important aspects of system quality. If the demonstration validates their concept at the end of the initial phase, they’ll consider opening the application up to a wider audience on a environment designed to handle a lot more user load.

Environments

The pattern I often use for promoting code through environments is:

Development machines:

  • Individual developers work on their own machine (which can be either physical or virtual) and push code from here to branches in the source code repository.
  • These branches are peer reviewed and subject to passing this are merged into the master branch (which is my preference for this scenario – YMMV).

Integration environment (a.k.a. Development Environment, or Dev-Int):

  • If the Build Pipeline has completed successfully (e.g. code compiles and no tests break), then the Release Pipeline deploys the build artifacts to the Integration Environment (it’s critical that these artifacts are used all the way through the deployment process as these are the ones that have been tested).
  • This environment is pretty unstable – code is frequently pushed here. In my world, it’s most typically used by developers who want to check that their code works as expected somewhere other than their machine. It’s not really something that testers or demonstrators would want to use.
  • I also like to run vulnerability scanning software like OWASP ZAP regularly on this environment to highlight any security issues, or run a baseline accessibility check using something like Pa11y.

Test environment:

  • The same binary artifacts deployed to Integration (like a zipped up package of website files) are then deployed to the Test environment.
  • I’ve usually set up this environment for testers who use it for any manual testing processes. Obviously user journey testing is automated as much as possible, but sometimes manual testing is still required – e.g. for further accessibility testing.

Demonstration environment:

  • I only push to this environment when I’m pretty confident that the code does what I expect and I’m happy to present it to a room full of people.

And in this scenario, if the client wishes to open up to a wider audience, I’d usually recommend the following two environments:

Pre-production a.k.a. QA (Quality Assurance), or Staging

  • Traditionally security checks (e.g. penetration tests) are run on this environment first, as are final performance tests.
  • This is the last environment before Production, and the infrastructure should mirror the infrastructure on Production so that results of any tests here are indicative of behaviour on Production.

Production a.k.a. Live

  • This is the most stable environment, and where customers will spend most of their time when using the application.
  • Often there’ll also be a mirror of this environment for ‘Disaster Recovery’ (DR) purposes.

Obviously different teams will have different and more customized needs – for example, sometimes teams aren’t able to deploy more frequently than once every sprint. If an emergency bug fix is required it’s useful to have a separate environment to allow these bug fixes to be tested before production, without disrupting the development team’s deployment process.

Do we always need all of these environments?

The environments used depend on the user needs – there’s no strategy which works in all cases. For our simple fictitious case, I think we only need Integration, Testing and Demonstration.

Here’s a high level process using the Microsoft stack that I could use to help meet the client’s initial need (only deploying as far as a demonstration platform):

reference-architecture-diagram-2

  • Developers build a website matching client needs, often pushing new code and features to a source code environment (I usually go with either GitHub or Azure DevOps Repos).
  • Code is compiled and tested using Azure DevOps Pipelines, and then the tested artifacts are deployed to:
    • The Integration Environment, then
    • The Testing Environment, and then
    • The Demonstration Environment.
  • Each one of these environments lives in its own Azure Resource Group with identical infrastructure (e.g. web farm, web application, database and monitoring capabilities). These are built using Azure Resource Manager (ARM) templates.

Using Azure Pipelines, I can create a Release Pipeline to build artifacts and deploy to the three environments above in three separate stages, as shown in the image below.

stages8

But as mentioned previously, a limitation is that this Release Pipeline doesn’t exist as code in my source code repository.

Fancy All New Multi-Stage Pipelines

But recently Azure have introduced a preview feature, where Build Pipelines have been renamed as “Pipelines”, and added some new functions.

If you want to see these in your Azure DevOps instance, log into your DevOps instance on Azure, and head over to the menu in the top right – select “Preview features”:

pipelinemenu

In the dialog window that appears, turn on the “Multi-stage Pipelines” option, highlighted in the image below:

multistagepipelines

Now your DevOps pipeline menu will look like the one below – note how the “Builds” sub-menu item has been renamed to Pipelines:

newpipelines

Now I’m able to use YAML to not only capture individual build steps, but I can package them up into stages. The image below shows how I’ve started to mirror the Release Pipeline process above using YAML – I’ve build and deployed to integration and testing environments.

build stages

I’ve also shown a fork in the process where I can run my OWASP ZAP vulnerability scanning tool after the site has been deployed on integration, at the same time as the Testing environment is being built and having artefacts deployed to it. The image below shows the tests that have failed and how they’re reported – I can select individual tests and add them as Bugs to Azure DevOps Boards.

failing tests

Microsoft have supplied some example YAML to help developers get started:

  • A simple Build -> Test -> Staging -> Production scenario.
  • A scenario with a stage that on completion triggers two stages, which then are both required for the final stage.

It’s a huge process improvement to be able to have my website source code and tests as C#, my infrastructure code as ARM templates, and my pipeline code as YAML.

For example, if someone deleted the pipeline (either accidentally or deliberately), it’s not really a big deal – we can recreate everything again in a matter of minutes. Or if the pipeline was acting in an unexpected way, I could spin up a duplicate of the pipeline and debug it safely away from production.

Current Limitations

Multi-stage pipelines are a preview feature in Azure DevOps, and personally I wouldn’t risk this with every production application yet. One major missing feature is the ability to manually approve progression from one stage to another, though I understand this is on the team’s backlog.

Wrapping Up

I really like how everything can live in source code – my full build and release process, both CI and CD, are captured in human readable YAML. This is incredibly powerful – code, tests, infrastructure and the CI/CD pipeline can be created as a template and new projects can be spun up in minutes rather than days. Additionally, I’m able to create and tear down containers which cover some overall system quality testing aspects, for example using the OWASP ZAP container to scan for vulnerabilities on the integration environment website.

As I mentioned at the start of this post, I’ll be writing more over the coming weeks about the example scenario in this post – with topics such as:

  • writing a multi-stage pipeline in YAML to create resource groups to encapsulate resources for each environment;
  • how to deploy infrastructure using ARM templates in each of the stages;
  • how to deploy artifacts to the infrastructure created at each stage;
  • use the OWASP ZAP tool to scan for vulnerabilities in the integration website, and the YAML code to do this.

 

.net, .net core, Azure, Cosmos

Test driving the Cosmos SDK v3 with .NET Core – Getting started with Azure Cosmos DB and .NET Core, Part #3

The Azure Cosmos team have recently released (and open sourced) a new SDK preview with some awesome features, as recently seen on the Azure Friday show on Channel 9. So I wanted to test drive the functions available in this SDK against the one I’ve been using (SDK v2.2.2) to see the differences.

In the last two parts of this series, I’ve looked at how to create databases and collections in the Azure Cosmos emulator using .NET Core and version 2.2.2 of the Cosmos SDK. I’ve also looked at how to carry out some string queries against documents held in collections.

Version 3 of the SDK is a preview – it’s not recommended for production use yet, and I’d expect a lot of changes between now and a production ready version.

And before you read my comparison…it’s all purely opinion based – sometimes one piece of code will look longer than another because of how I’ve chosen to write it. I’ve also written the samples as synchronous only – this is just because I want to focus on the SDK differences in this post, rather than explore an async/await topic.

Connecting to the Cosmos Emulator

Previously when I was setting up a connection to my Cosmos Emulator instance, I’d write something in C# like the code below.

#region Set up Document client
 
// Create the client connection using v2.2.2
client = new DocumentClient(
    new Uri(CosmosEndpoint),
    EmulatorKey,
    new ConnectionPolicy
    {
        ConnectionMode = ConnectionMode.Direct,
        ConnectionProtocol = Protocol.Tcp
    });
 
#endregion

Now I can connect using the code below – client instantiation in SDK 3 is cleaner, and has keywords relevant to Cosmos rather than its previous name of DocumentDB. This makes it easier to read and conveys intent much better.

#region Set up Cosmos client
 
// Create the configuration using SDK v3
var configuration = new CosmosConfiguration(CosmosEndpoint, EmulatorKey)
{
    ConnectionMode = ConnectionMode.Direct,
};
 
_client = new CosmosClient(configuration);
 
#endregion

Creating the database and collections

Looking at my code with the previous SDK…well there sure is a lot of it. And it works, so I guess that’s something. But creation of objects from the UriFactory adds a lot of noise, and I’ve previously hidden code like this in a facade class.

#region Create database, collection and indexing policy
 
// Set up database and collection Uris
var databaseUrl = UriFactory.CreateDatabaseUri(DatabaseId);
var naturalSiteCollectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseId, NaturalSitesCollection);
 
// Create the database if it doesn't exist
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
var naturalSitesCollection = new DocumentCollection { Id = NaturalSitesCollection };
 
// Create an indexing policy to make strings have a Ranged index.
var indexingPolicy = new IndexingPolicy();
indexingPolicy.IncludedPaths.Add(new IncludedPath
{
    Path = "/*",
    Indexes = new Collection<Microsoft.Azure.Documents.Index>()
    {
        new RangeIndex(DataType.String) { Precision = -1 }
    }
});
 
// Assign the index policy to our collection
naturalSitesCollection.IndexingPolicy = indexingPolicy;
 
// And create the collection if it doesn't exist
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrl, naturalSitesCollection).Wait();
 
#endregion

The new code is much cleaner – no more UriFactories, and we again have keywords which are more relevant to Cosmos.

There are a few things I think are worth commenting on:

  • “Collections” are now “Containers” in the SDK, although they’re still Collections in the Data Explorer.
  • We can access the array of available databases from a “Databases” method accessible from the Cosmos client, and we can access available containers from a “Containers” method available from individual Cosmos databases. This object hierarchy makes much more sense to me than having to create everything from methods accessible from the DocumentClient in v2.2.2.
  • We now need to specify a partition key name for a container, whereas we didn’t need to do that in v2.2.2.
#region Create database, collection and indexing policy
 
// Create the database if it doesn't exist
CosmosDatabase database = _client.Databases.CreateDatabaseIfNotExistsAsync(DatabaseId).Result;
 
var containerSettings = new CosmosContainerSettings(NaturalSitesCollection, "/Name")
{
    // Assign the index policy to our container
    IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String) { Precision = -1 })
};
 
CosmosContainer container = database.Containers.CreateContainerIfNotExistsAsync(containerSettings).Result;
 
#endregion

Saving items to a container

The code using the previous SDK is pretty clean already.

#region Create a sample document in our collection
 
// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Name = "Giant's Causeway" };
 
// Create the document in our database
client.CreateDocumentAsync(naturalSiteCollectionUri, giantsCauseway).Wait();
 
#endregion

But the new SDK improves on its predecessor by using a more logical object hierarchy – we create items from an “Items” array which is available from a container, and the naming conventions are also more consistent.

#region Create sample item in our container in SDK 3
 
// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Id = Guid.NewGuid().ToString(), Name = "Giant's Causeway" };
 
// Create the document in our database
container.Items.CreateItemAsync(giantsCauseway.Name, giantsCauseway).Wait();
 
#endregion

There are also a couple of changes in SDK worth noting:

  • When creating the item, we also need to explicitly specify the value corresponding to the partition key.
  • My custom object now needs to have an ID property with type of string, decorated as a JsonProperty in the way shown in the code below. I didn’t need this with the previous SDK.
public class NaturalSite
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }
 
    public string Name { get; set; }
}

Querying a collection/container for exact and partial string matches

Using SDK 2.2.2, my code can look something like the sample below – I’ve used a query facade and can take advantage of SDK 2.2.2’s LINQ querying function.

#region Query collection for exact matches
 
// Instantiate with the DocumentClient and database identifier
var cosmosQueryFacade = new CosmosQueryFacade<NaturalSite>
{
    DocumentClient = client,
    DatabaseId = DatabaseId,
    CollectionId = NaturalSitesCollection
};
 
// We can look for strings that exactly match a search string
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name == "Giant's Causeway").Result;
 
foreach (var site in sites)
{
    Console.WriteLine($"The natural site name is: {site.Name}");
}
 
#endregion

But in the new SDK v3, there’s presently no LINQ query function. It’s high on the team’s list of ‘things to do next’, and in the meantime I can use parameterized queries to achieve the same result.

#region Query collection for exact matches using SDK 3
 
// Or we can use the new SDK, which uses the CosmosSqlQueryDefinition object
var sql = new CosmosSqlQueryDefinition("Select * from Items i where i.Name = @name")
                                                           .UseParameter("@name", "Giant's Causeway");
 
 
var setIterator = container.Items.CreateItemQuery<NaturalSite>(
                    sqlQueryDefinition: sql,
                    partitionKey: "Giant's Causeway");
 
while (setIterator.HasMoreResults)
{
    foreach (var site in setIterator.FetchNextSetAsync().Result)
    {
        Console.WriteLine($"The natural site name is: {site.Name}");
    }
}
 
#endregion

For partial string matches, previously I could use the built in LINQ functions as shown below.

#region Query collection for matches that start with our search string
 
// And we can search for strings that start with a search string,
// as long as we have strings set up to be Ranged Indexes
sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;
 
foreach (var site in sites)
{
    Console.WriteLine($"The natural site name is: {site.Name}");
}
 
#endregion

And even though we don’t have LINQ functions yet in the new SDK v3, we can still achieve the same result with the SQL query shown in the code below.

#region Or query collection for matches that start with our search string using SDK 3
 
sql = new CosmosSqlQueryDefinition("SELECT * FROM Items i WHERE STARTSWITH(i.Name, @name)")
    .UseParameter("@name", "Giant");
 
setIterator = container.Items.CreateItemQuery<NaturalSite>(
    sqlQueryDefinition: sql,
    partitionKey: "Giant's Causeway");
 
while (setIterator.HasMoreResults)
{
    foreach (var site in setIterator.FetchNextSetAsync().Result)
    {
        Console.WriteLine($"The natural site name is: {site.Name}");
    }
}
 
#endregion

What I’d like to see next

The Cosmos team have said the SDK is a preview only – it’s not suitable for production use yet, even though it already has some very nice advantages over the previous SDK. I think the things I’d like to see in future iterations are:

  • LINQ querying – which I know is already on the backlog.
  • More support for “Request Unit” information, so I can get a little more insight into the cost of my queries.

Wrapping up

The new Cosmos SDK v3 looks really interesting – it allows me to write much cleaner code with clear intent. And even though it’s not production ready yet, I’m going to start trying to use it where I can so I’m ready to take advantage of the new features as soon as they’re more generally available, and supported. I hope this helps anyone else who’s thinking about trying out the new SDK – what would you like to see?