.net, .net core, Azure, Azure DevOps, Azure Pipelines, Web Development

Everything as Code with Azure DevOps Pipelines: C#, ARM, and YAML: Part #1

Like a lot of developers, I’ve been using Azure DevOps to manage my CI/CD pipelines. I think (or I hope anyway) most developers now are using a continuous integration process – commit to a code repository like GitHub, which has a hook into a tool (like DevOps Build Pipelines) which checks every change to make sure it compiles, and doesn’t break any tests.

Just a note – this post doesn’t have any code, it’s more of an introduction to how I’ve found Azure Pipelines can make powerful improvements to a development process. It also looks at one of the new preview features, multi-stage build pipelines. I’ll expand on individual parts of this with more code in later posts.

Azure Pipelines have traditionally been split into two types – build and release pipelines.

oldpipelines

I like to think of the split in this way:

Build Pipelines

I’ve used these for Continuous Integration (CI) activities, like compiling code, running automated tests, creating and publishing build artifacts. These pipelines can be saved as YAML – a human readable serialisation language – and pushed to a source code repository. This means that one developer can pull another developer’s CI code and run exactly the same pipeline without having to write any code. One limitation is that Build Pipelines are a simple list of tasks/jobs – historically it’s not been possible to split these instructions into separate stages which can depend on one or more conditions (though later in this post, I’ll write more about a preview feature which introduces this capability).

Release Pipelines

I’ve used these for Continuous Deployment (CD) activities, like creating environments and deploying artifacts created during the Build Pipeline process to these environments. Azure Release Pipelines can be broken up into stages with lots of customized input and output conditions, e.g. you could choose to not allow artifacts to be deployed to an environment until a manual approval process is completed. At the time of writing this, Release Pipelines cannot be saved as YAML.

What does this mean in practice?

Let’s make up a simple scenario.

A client has defined a website that they would like my team to develop, and would like to see a demonstration of progress to a small group of stakeholders at the end of each two-week iteration. They’ve also said that they care about good practice – site stability and security are important aspects of system quality. If the demonstration validates their concept at the end of the initial phase, they’ll consider opening the application up to a wider audience on a environment designed to handle a lot more user load.

Environments

The pattern I often use for promoting code through environments is:

Development machines:

  • Individual developers work on their own machine (which can be either physical or virtual) and push code from here to branches in the source code repository.
  • These branches are peer reviewed and subject to passing this are merged into the master branch (which is my preference for this scenario – YMMV).

Integration environment (a.k.a. Development Environment, or Dev-Int):

  • If the Build Pipeline has completed successfully (e.g. code compiles and no tests break), then the Release Pipeline deploys the build artifacts to the Integration Environment (it’s critical that these artifacts are used all the way through the deployment process as these are the ones that have been tested).
  • This environment is pretty unstable – code is frequently pushed here. In my world, it’s most typically used by developers who want to check that their code works as expected somewhere other than their machine. It’s not really something that testers or demonstrators would want to use.
  • I also like to run vulnerability scanning software like OWASP ZAP regularly on this environment to highlight any security issues, or run a baseline accessibility check using something like Pa11y.

Test environment:

  • The same binary artifacts deployed to Integration (like a zipped up package of website files) are then deployed to the Test environment.
  • I’ve usually set up this environment for testers who use it for any manual testing processes. Obviously user journey testing is automated as much as possible, but sometimes manual testing is still required – e.g. for further accessibility testing.

Demonstration environment:

  • I only push to this environment when I’m pretty confident that the code does what I expect and I’m happy to present it to a room full of people.

And in this scenario, if the client wishes to open up to a wider audience, I’d usually recommend the following two environments:

Pre-production a.k.a. QA (Quality Assurance), or Staging

  • Traditionally security checks (e.g. penetration tests) are run on this environment first, as are final performance tests.
  • This is the last environment before Production, and the infrastructure should mirror the infrastructure on Production so that results of any tests here are indicative of behaviour on Production.

Production a.k.a. Live

  • This is the most stable environment, and where customers will spend most of their time when using the application.
  • Often there’ll also be a mirror of this environment for ‘Disaster Recovery’ (DR) purposes.

Obviously different teams will have different and more customized needs – for example, sometimes teams aren’t able to deploy more frequently than once every sprint. If an emergency bug fix is required it’s useful to have a separate environment to allow these bug fixes to be tested before production, without disrupting the development team’s deployment process.

Do we always need all of these environments?

The environments used depend on the user needs – there’s no strategy which works in all cases. For our simple fictitious case, I think we only need Integration, Testing and Demonstration.

Here’s a high level process using the Microsoft stack that I could use to help meet the client’s initial need (only deploying as far as a demonstration platform):

reference-architecture-diagram-2

  • Developers build a website matching client needs, often pushing new code and features to a source code environment (I usually go with either GitHub or Azure DevOps Repos).
  • Code is compiled and tested using Azure DevOps Pipelines, and then the tested artifacts are deployed to:
    • The Integration Environment, then
    • The Testing Environment, and then
    • The Demonstration Environment.
  • Each one of these environments lives in its own Azure Resource Group with identical infrastructure (e.g. web farm, web application, database and monitoring capabilities). These are built using Azure Resource Manager (ARM) templates.

Using Azure Pipelines, I can create a Release Pipeline to build artifacts and deploy to the three environments above in three separate stages, as shown in the image below.

stages8

But as mentioned previously, a limitation is that this Release Pipeline doesn’t exist as code in my source code repository.

Fancy All New Multi-Stage Pipelines

But recently Azure have introduced a preview feature, where Build Pipelines have been renamed as “Pipelines”, and added some new functions.

If you want to see these in your Azure DevOps instance, log into your DevOps instance on Azure, and head over to the menu in the top right – select “Preview features”:

pipelinemenu

In the dialog window that appears, turn on the “Multi-stage Pipelines” option, highlighted in the image below:

multistagepipelines

Now your DevOps pipeline menu will look like the one below – note how the “Builds” sub-menu item has been renamed to Pipelines:

newpipelines

Now I’m able to use YAML to not only capture individual build steps, but I can package them up into stages. The image below shows how I’ve started to mirror the Release Pipeline process above using YAML – I’ve build and deployed to integration and testing environments.

build stages

I’ve also shown a fork in the process where I can run my OWASP ZAP vulnerability scanning tool after the site has been deployed on integration, at the same time as the Testing environment is being built and having artefacts deployed to it. The image below shows the tests that have failed and how they’re reported – I can select individual tests and add them as Bugs to Azure DevOps Boards.

failing tests

Microsoft have supplied some example YAML to help developers get started:

  • A simple Build -> Test -> Staging -> Production scenario.
  • A scenario with a stage that on completion triggers two stages, which then are both required for the final stage.

It’s a huge process improvement to be able to have my website source code and tests as C#, my infrastructure code as ARM templates, and my pipeline code as YAML.

For example, if someone deleted the pipeline (either accidentally or deliberately), it’s not really a big deal – we can recreate everything again in a matter of minutes. Or if the pipeline was acting in an unexpected way, I could spin up a duplicate of the pipeline and debug it safely away from production.

Current Limitations

Multi-stage pipelines are a preview feature in Azure DevOps, and personally I wouldn’t risk this with every production application yet. One major missing feature is the ability to manually approve progression from one stage to another, though I understand this is on the team’s backlog.

Wrapping Up

I really like how everything can live in source code – my full build and release process, both CI and CD, are captured in human readable YAML. This is incredibly powerful – code, tests, infrastructure and the CI/CD pipeline can be created as a template and new projects can be spun up in minutes rather than days. Additionally, I’m able to create and tear down containers which cover some overall system quality testing aspects, for example using the OWASP ZAP container to scan for vulnerabilities on the integration environment website.

As I mentioned at the start of this post, I’ll be writing more over the coming weeks about the example scenario in this post – with topics such as:

  • writing a multi-stage pipeline in YAML to create resource groups to encapsulate resources for each environment;
  • how to deploy infrastructure using ARM templates in each of the stages;
  • how to deploy artifacts to the infrastructure created at each stage;
  • use the OWASP ZAP tool to scan for vulnerabilities in the integration website, and the YAML code to do this.

 

.net, .net core, Azure, Cosmos

Test driving the Cosmos SDK v3 with .NET Core – Getting started with Azure Cosmos DB and .NET Core, Part #3

The Azure Cosmos team have recently released (and open sourced) a new SDK preview with some awesome features, as recently seen on the Azure Friday show on Channel 9. So I wanted to test drive the functions available in this SDK against the one I’ve been using (SDK v2.2.2) to see the differences.

In the last two parts of this series, I’ve looked at how to create databases and collections in the Azure Cosmos emulator using .NET Core and version 2.2.2 of the Cosmos SDK. I’ve also looked at how to carry out some string queries against documents held in collections.

Version 3 of the SDK is a preview – it’s not recommended for production use yet, and I’d expect a lot of changes between now and a production ready version.

And before you read my comparison…it’s all purely opinion based – sometimes one piece of code will look longer than another because of how I’ve chosen to write it. I’ve also written the samples as synchronous only – this is just because I want to focus on the SDK differences in this post, rather than explore an async/await topic.

Connecting to the Cosmos Emulator

Previously when I was setting up a connection to my Cosmos Emulator instance, I’d write something in C# like the code below.

#region Set up Document client
 
// Create the client connection using v2.2.2
client = new DocumentClient(
    new Uri(CosmosEndpoint),
    EmulatorKey,
    new ConnectionPolicy
    {
        ConnectionMode = ConnectionMode.Direct,
        ConnectionProtocol = Protocol.Tcp
    });
 
#endregion

Now I can connect using the code below – client instantiation in SDK 3 is cleaner, and has keywords relevant to Cosmos rather than its previous name of DocumentDB. This makes it easier to read and conveys intent much better.

#region Set up Cosmos client
 
// Create the configuration using SDK v3
var configuration = new CosmosConfiguration(CosmosEndpoint, EmulatorKey)
{
    ConnectionMode = ConnectionMode.Direct,
};
 
_client = new CosmosClient(configuration);
 
#endregion

Creating the database and collections

Looking at my code with the previous SDK…well there sure is a lot of it. And it works, so I guess that’s something. But creation of objects from the UriFactory adds a lot of noise, and I’ve previously hidden code like this in a facade class.

#region Create database, collection and indexing policy
 
// Set up database and collection Uris
var databaseUrl = UriFactory.CreateDatabaseUri(DatabaseId);
var naturalSiteCollectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseId, NaturalSitesCollection);
 
// Create the database if it doesn't exist
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
var naturalSitesCollection = new DocumentCollection { Id = NaturalSitesCollection };
 
// Create an indexing policy to make strings have a Ranged index.
var indexingPolicy = new IndexingPolicy();
indexingPolicy.IncludedPaths.Add(new IncludedPath
{
    Path = "/*",
    Indexes = new Collection<Microsoft.Azure.Documents.Index>()
    {
        new RangeIndex(DataType.String) { Precision = -1 }
    }
});
 
// Assign the index policy to our collection
naturalSitesCollection.IndexingPolicy = indexingPolicy;
 
// And create the collection if it doesn't exist
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrl, naturalSitesCollection).Wait();
 
#endregion

The new code is much cleaner – no more UriFactories, and we again have keywords which are more relevant to Cosmos.

There are a few things I think are worth commenting on:

  • “Collections” are now “Containers” in the SDK, although they’re still Collections in the Data Explorer.
  • We can access the array of available databases from a “Databases” method accessible from the Cosmos client, and we can access available containers from a “Containers” method available from individual Cosmos databases. This object hierarchy makes much more sense to me than having to create everything from methods accessible from the DocumentClient in v2.2.2.
  • We now need to specify a partition key name for a container, whereas we didn’t need to do that in v2.2.2.
#region Create database, collection and indexing policy
 
// Create the database if it doesn't exist
CosmosDatabase database = _client.Databases.CreateDatabaseIfNotExistsAsync(DatabaseId).Result;
 
var containerSettings = new CosmosContainerSettings(NaturalSitesCollection, "/Name")
{
    // Assign the index policy to our container
    IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String) { Precision = -1 })
};
 
CosmosContainer container = database.Containers.CreateContainerIfNotExistsAsync(containerSettings).Result;
 
#endregion

Saving items to a container

The code using the previous SDK is pretty clean already.

#region Create a sample document in our collection
 
// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Name = "Giant's Causeway" };
 
// Create the document in our database
client.CreateDocumentAsync(naturalSiteCollectionUri, giantsCauseway).Wait();
 
#endregion

But the new SDK improves on its predecessor by using a more logical object hierarchy – we create items from an “Items” array which is available from a container, and the naming conventions are also more consistent.

#region Create sample item in our container in SDK 3
 
// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Id = Guid.NewGuid().ToString(), Name = "Giant's Causeway" };
 
// Create the document in our database
container.Items.CreateItemAsync(giantsCauseway.Name, giantsCauseway).Wait();
 
#endregion

There are also a couple of changes in SDK worth noting:

  • When creating the item, we also need to explicitly specify the value corresponding to the partition key.
  • My custom object now needs to have an ID property with type of string, decorated as a JsonProperty in the way shown in the code below. I didn’t need this with the previous SDK.
public class NaturalSite
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }
 
    public string Name { get; set; }
}

Querying a collection/container for exact and partial string matches

Using SDK 2.2.2, my code can look something like the sample below – I’ve used a query facade and can take advantage of SDK 2.2.2’s LINQ querying function.

#region Query collection for exact matches
 
// Instantiate with the DocumentClient and database identifier
var cosmosQueryFacade = new CosmosQueryFacade<NaturalSite>
{
    DocumentClient = client,
    DatabaseId = DatabaseId,
    CollectionId = NaturalSitesCollection
};
 
// We can look for strings that exactly match a search string
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name == "Giant's Causeway").Result;
 
foreach (var site in sites)
{
    Console.WriteLine($"The natural site name is: {site.Name}");
}
 
#endregion

But in the new SDK v3, there’s presently no LINQ query function. It’s high on the team’s list of ‘things to do next’, and in the meantime I can use parameterized queries to achieve the same result.

#region Query collection for exact matches using SDK 3
 
// Or we can use the new SDK, which uses the CosmosSqlQueryDefinition object
var sql = new CosmosSqlQueryDefinition("Select * from Items i where i.Name = @name")
                                                           .UseParameter("@name", "Giant's Causeway");
 
 
var setIterator = container.Items.CreateItemQuery<NaturalSite>(
                    sqlQueryDefinition: sql,
                    partitionKey: "Giant's Causeway");
 
while (setIterator.HasMoreResults)
{
    foreach (var site in setIterator.FetchNextSetAsync().Result)
    {
        Console.WriteLine($"The natural site name is: {site.Name}");
    }
}
 
#endregion

For partial string matches, previously I could use the built in LINQ functions as shown below.

#region Query collection for matches that start with our search string
 
// And we can search for strings that start with a search string,
// as long as we have strings set up to be Ranged Indexes
sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;
 
foreach (var site in sites)
{
    Console.WriteLine($"The natural site name is: {site.Name}");
}
 
#endregion

And even though we don’t have LINQ functions yet in the new SDK v3, we can still achieve the same result with the SQL query shown in the code below.

#region Or query collection for matches that start with our search string using SDK 3
 
sql = new CosmosSqlQueryDefinition("SELECT * FROM Items i WHERE STARTSWITH(i.Name, @name)")
    .UseParameter("@name", "Giant");
 
setIterator = container.Items.CreateItemQuery<NaturalSite>(
    sqlQueryDefinition: sql,
    partitionKey: "Giant's Causeway");
 
while (setIterator.HasMoreResults)
{
    foreach (var site in setIterator.FetchNextSetAsync().Result)
    {
        Console.WriteLine($"The natural site name is: {site.Name}");
    }
}
 
#endregion

What I’d like to see next

The Cosmos team have said the SDK is a preview only – it’s not suitable for production use yet, even though it already has some very nice advantages over the previous SDK. I think the things I’d like to see in future iterations are:

  • LINQ querying – which I know is already on the backlog.
  • More support for “Request Unit” information, so I can get a little more insight into the cost of my queries.

Wrapping up

The new Cosmos SDK v3 looks really interesting – it allows me to write much cleaner code with clear intent. And even though it’s not production ready yet, I’m going to start trying to use it where I can so I’m ready to take advantage of the new features as soon as they’re more generally available, and supported. I hope this helps anyone else who’s thinking about trying out the new SDK – what would you like to see?

.net, .net core, Non-functional Requirements, Performance

Using async/await and Task.WhenAll to improve the overall speed of your C# code

Recently I’ve been looking at ways to improve the performance of some .NET code, and this post is about an async/await pattern that I’ve observed a few times that I’ve been able to refactor.

Every-so-often, I see code like the sample below – a single method or service which awaits the outputs of numerous methods which are marked as asynchronous.

await FirstMethodAsync();
 
await SecondMethodAsync();
 
await ThirdMethodAsync();

The three methods don’t seem to depend on each other in any way, and since they’re all asynchronous methods, it’s possible to run them in parallel. But for some reason, the implementation is to run all three synchronously – the flow of execution awaits the first method running and completing, then the second, and then the third.

We might be able to do better than this.

Let’s look at an example

For this post, I’ve created a couple of sample methods which can be run asynchronously – they’re called SlowAndComplexSumAsync and SlowAndComplexWordAsync.

What these methods actually do isn’t important, so don’t worry about what function they serve – I’ve just contrived them to do something and be quite slow, so I can observe how my code’s overall performance alters as I do some refactoring.

First, SlowAndComplexSumAsync (below) adds a few numbers together, with some artificial delays to deliberately slow it down – this takes about 2.5s to run.

private static async Task<int> SlowAndComplexSumAsync()
{
    int sum = 0;
    foreach (var counter in Enumerable.Range(0, 25))
    {
        sum += counter;
        await Task.Delay(100);
    }
 
    return sum;
}

Next SlowAndComplexWordAsync (below) concatenates characters together, again with some artificial delays to slow it down. This method usually about 4s to run.

private static async Task<string> SlowAndComplexWordAsync()
{
    var word = string.Empty;
    foreach (var counter in Enumerable.Range(65, 26))
    {
        word = string.Concat(word, (char) counter);
        await Task.Delay(150);
    }
 
    return word;
}

Running synchronously – the slow way

Obviously I can just prefix each method with the “await” keyword in a Main method marked with the async keyword, as shown below. This code basically just runs the two sample methods synchronously (despite the async/await cruft in the code).

private static async Task Main(string[] args)
{
    var stopwatch = new Stopwatch();
    stopwatch.Start();
 
    // This method takes about 2.5s to run
    var complexSum = await SlowAndComplexSumAsync();
 
    // The elapsed time will be approximately 2.5s so far
    Console.WriteLine("Time elapsed when sum completes..." + stopwatch.Elapsed);
 
    // This method takes about 4s to run
    var complexWord = await SlowAndComplexWordAsync();
    
    // The elapsed time at this point will be about 6.5s
    Console.WriteLine("Time elapsed when both complete..." + stopwatch.Elapsed);
    
    // These lines are to prove the outputs are as expected,
    // i.e. 300 for the complex sum and "ABC...XYZ" for the complex word
    Console.WriteLine("Result of complex sum = " + complexSum);
    Console.WriteLine("Result of complex letter processing " + complexWord);
 
    Console.Read();
}

When I run this code, the console output looks like the image below:

series

As can be seen in the console output, both methods run consecutively – the first one takes a bit over 2.5s, and then the second method runs (taking a bit over 4s), causing the total running time to be just under 7s (which is pretty close to the predicted duration of 6.5s).

Running asynchronously – the faster way

But I’ve missed a great opportunity to make this program run faster. Instead of running each method and waiting for it to complete before starting the next one, I can start them all together and await the Task.WhenAll method to make sure all methods are completed before proceeding to the rest of the program.

This technique is shown in the code below.

private static async Task Main(string[] args)
{
    var stopwatch = new Stopwatch();
    stopwatch.Start();
 
    // this task will take about 2.5s to complete
    var sumTask = SlowAndComplexSumAsync();
 
    // this task will take about 4s to complete
    var wordTask = SlowAndComplexWordAsync();
 
    // running them in parallel should take about 4s to complete
    await Task.WhenAll(sumTask, wordTask);

    // The elapsed time at this point will only be about 4s
    Console.WriteLine("Time elapsed when both complete..." + stopwatch.Elapsed);
 
    // These lines are to prove the outputs are as expected,
    // i.e. 300 for the complex sum and "ABC...XYZ" for the complex word
    Console.WriteLine("Result of complex sum = " + sumTask.Result);
    Console.WriteLine("Result of complex letter processing " + wordTask.Result);
 
    Console.Read();
}

And the outputs are shown in the image below.

parallel

The total running time is now only a bit over 4s – and this is way better than the previous time of around 7s. This is because we are running both methods in parallel, and making full use of the opportunity asynchronous methods present. Now our total execution time is only as slow as the slowest method, rather than being the cumulative time for all methods executing one after each other.

Wrapping up

I hope this post has helped shine a little light on how to use the async/await keywords and how to use Task.WhenAll to run independent methods in parallel.

Obviously every case has its own merits – but if code has series of asynchronous methods written so that each one has to wait for the previous one to complete, definitely check out whether the code can be refactored to use Task.WhenAll to improve the overall speed.

And maybe even more importantly, when designing an API surface, keep in mind that decoupling dependencies between methods might give developers using the API an opportunity to run these asynchronous methods in parallel.


About me: I regularly post about Microsoft technologies and .NET – if you’re interested, please follow me on Twitter, or have a look at my previous posts here. Thanks!

.net core, Azure, Cosmos, NoSQL

Getting started with Azure Cosmos DB and .NET Core: Part #2 – string querying and ranged indexes

Last time I scratched the surface of creating databases and collections in Azure Cosmos using the emulator and some C# code written using .NET Core. This time I’m going to dig a bit deeper into how to query these databases and collections with C#, and show a few code snippets that I’m using to help remove cruft from my classes. I’m also going write a little about Indexing Policies and how to use them to do useful string comparison queries.

Initializing Databases and Collections

I use the DocumentClient object to create databases and collections, and previously I used the CreateDatabaseAsync and CreateDocumentCollectionAsync methods to create databases and document collections.

But after running my test project a few times it got a bit annoying to keep having to delete the database from my local Cosmos instance before running my code, or have the code throw an exception.

Fortunately I’ve discovered the Cosmos SDK has a nice solution for this – a couple of methods which are named CreateDatabaseIfNotExistsAsync and CreateDocumentCollectionIfNotExistsAsync.

string DatabaseId = "LocalLandmarks";
string NaturalSitesCollection = "NaturalSites";
 
var databaseUrl = UriFactory.CreateDatabaseUri(DatabaseId);
var collectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdNaturalSitesCollection);
 
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrlnew DocumentCollection { Id = NaturalSitesCollection }).Wait();

Now I can initialize my code repeatedly without having to tear down my database or handle exceptions.

What about querying by something more useful than the document resource ID?

Last time I wrote some code that took a POCO and inserted it as a document into the Cosmos emulator.

// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Name = "Giant's Causeway" };
 
// Add this POCO as a document in Cosmos to our natural site collection
var collectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdNaturalSitesCollection);
var itemResult = client.CreateDocumentAsync(collectionUrigiantsCauseway).Result;

Then I was able to query the database for that document using the document resource ID.

// Use the ID to retrieve the object we just created
var document = client
    .ReadDocumentAsync(
        UriFactory.CreateDocumentUri(DatabaseIdNaturalSitesCollectionitemResult.Resource.Id))
    .Result;

But that’s not really useful to me – I’d rather query by a property of the POCO. For example, I’d like to query by the Name property, perhaps with an object instantiation and method signature like the suggestion below:

// Instantiate with the DocumentClient and database identifier
var cosmosQueryFacade = new CosmosQueryFacade<NaturalSite>
{
    DocumentClient = client,
    DatabaseId = DatabaseId,
    CollectionId = NaturalSitesCollection
};
 
// Querying one collection
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name == "Giant's Causeway").Result;

There’s a really useful sample project available with the Cosmos emulator which provided some code that I’ve adapted – you can access it from the Quickstart screen in the Data Explorer (available at https://localhost:8081/_explorer/index.html after you start the emulator). The image below shows how I’ve accessed the sample, which is available by clicking on the “Download” button after selecting the .NET Core tab.

sampleapp

The code below shows a query facade class that I have created – I can instantiate the object with parameters like the Cosmos DocumentClient, and the database identifier.

I’m going to be enhancing this Facade over the next few posts in this series, including how to use the new version 3.0 of the Cosmos SDK which has recently entered public preview.

public class CosmosQueryFacade<Twhere T : class
{
    public string CollectionId { getset; }
 
    public string DatabaseId { getset; }
 
    public DocumentClient DocumentClient { getset; }
 
    public async Task<IEnumerable<T>> GetItemsAsync(Expression<Func<Tbool>> predicate)
    {
        var documentCollectionUrl = UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId);
 
        var query = DocumentClient.CreateDocumentQuery<T>(documentCollectionUrl)
            .Where(predicate)
            .AsDocumentQuery();
 
        var results = new List<T>();
 
        while (query.HasMoreResults)
        {
            results.AddRange(await query.ExecuteNextAsync<T>());
        }
 
        return results;
    }
}

This class lets me query when I know the full name of the site. But what happens if I want to do a different kind of query – instead of exact comparison, what about something like “StartsWith”?

// Querying using LINQ StartsWith  
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;

If I run this, I get an error:

An invalid query has been specified with filters against path(s) 
that are not range-indexed. 
Consider adding allow scan header in the request.

What’s gone wrong? The clue is in the error message – I don’t have the right indexes applied to my collection.

Indexing Policies in Cosmos

From Wikipedia, an index is a data structure that improves the speed of data retrieval from a database. But as we’ve seen from the error above, in Cosmos it’s even more than this. Certain types of index won’t permit certain types of comparison operation, and when I tried to carry out that operation, by default I got an error (rather than just a slow response).

One of the really well publicised benefits of Cosmos is that documents added to collections in a Azure Cosmos database are automatically indexed. And whereas that’s extremely powerful and useful, it’s not magic – Cosmos can’t know what indexes match my specific business logic, and won’t add them.

There are three types of indexes in Cosmos:

  • Hash, used for:
    • Equality queries e.g. m => m.Name == “Giant’s Causeway”
  • Range, used for:
    • Equality queries,
    • Comparison within a range, e.g. m => m.Age > 5, or m => m.StartsWith(“Giant”)
    • Ordering e.g. OrderBy(m => m.Name)
  • Spatial – used for geo-spatial data – more on this in future posts.

So I’ve created a collection called “NaturalSites” in my Cosmos emulator, and added some data to it – but how can I find out what the present indexing policy is. That’s pretty straightforward – it’s all in the Data Explorer again. Go to the Explorer tab, expand the database to see its collections, and then click on the “Scale & settings” menu item – this will show you the indexing policy for the collection.

indexes

When I created the database and collection from C#, the indexing policy created by default is shown below:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Hash",
          "dataType": "String",
          "precision": 3
        }
      ]
    }
  ],
  "excludedPaths": []
}

I can see that in the list of indexes for my collection, the dataType of String has an index of Hash (I’ve highlighted this in red above). We know this index is good for equality comparisons, but as the error message from before suggests, we need this to be a Ranged index to be able to do more complex comparisons than just equality between two strings.

I can modify the index policy for the collection in C#, as shown below:

// Set up Uris to create database and collection
var databaseUri = UriFactory.CreateDatabaseUri(DatabaseId);
var constructedSiteCollectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdConstructedSitesCollection);
 
// Create the database
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
// Create a document collection
var naturalSitesCollection = new DocumentCollection { Id = NaturalSitesCollection };
// Now create the policy to make strings a Ranged index
var indexingPolicy = new IndexingPolicy();
indexingPolicy.IncludedPaths.Add(new IncludedPath
{
    Path = "/*",
    Indexes = new Collection<Microsoft.Azure.Documents.Index>()
    {
        new RangeIndex(DataType.String) { Precision = -1 }
    }
});

// Now assign the policy to the document collection
naturalSitesCollection.IndexingPolicy = indexingPolicy;
 
// And finally create the document collection
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrinaturalSitesCollection).Wait();

And now if I inspect the Data Explorer for this collection, the index policy created is shown below. As you can see from the section highlighted in red, the kind of index now used for comparing the dataType String is now a Range.

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        },
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        }
      ]
    }
  ],
  "excludedPaths": []
}

So when I run the code below to look for sites that start with “Giant”, the code now works and returns objects rather than throwing an exception.

var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;

There are many more indexing examples here if you’re interested.

Wrapping up

I’ve taken a small step beyond the previous part of this tutorial, and I’m now able to query for strings that exactly and partially match values in my Cosmos database. As usual I’ve uploaded my code to GitHub and you can pull the code from here. Next time I’m going to try to convert my code to the new version of the SDK, which is now in public preview.

Continue reading

.net core, Azure, Cosmos

Getting started with Azure Cosmos DB and .NET Core: Part #1 – Installing the Cosmos emulator, writing and reading data

I’d like to start using Cosmos, and I’ve have a bunch of questions about it – how to create databases, how to write to it and read from it, how can I use attachments and spatial data, how can I secure it, how can I test the code that uses it…and lots more. So I’m going to write a few posts over the coming weeks which hopefully will answer these questions, starting with some basics and moving to more advanced topics in later posts.

Can I trial Cosmos to help me understand it a bit more?

Fortunately Microsoft has an answer for this – they’ve provided a Cosmos emulator, and I can trial Cosmos without going near the Azure cloud.

The official Microsoft docs on the Cosmos Emulator are fantastic – you can install it locally or use a Docker image.

My own preference is to use the installer. I’ve tried using the Docker image and this needs to download a Windows container which totals well over 5GB, which can take a long time.

docker_pull

The emulator’s installer is only about 50MB and I was able to get up and running with this a lot faster than with Docker containers. There were some snags when I installed it – after trying to run it for the first time, I got this message:

cosmos installer

But this was pretty easy to work around by just following the instruction in the message and running the emulator with the NoFirewall option:

Microsoft.Azure.Cosmos.Emulator.exe /NoFirewall

I prefer to manage the emulator from PowerShell – to do this, after installing the emulator I run the PowerShell command below to import modules that let me use some useful PowerShell commands.

Import-Module "$env:ProgramFiles\Azure Cosmos DB Emulator\PSModules\Microsoft.Azure.CosmosDB.Emulator"

And now I can control the emulator with those built in PowerShell commands.

powershell cosmos emulator controll

The Cosmos Emulator’s Local Data Explorer

When I’ve started the emulator, I can browse to the URL below:

https://localhost:8081/_explorer/index.html

This opens the Emulator’s Data Explorer, which has some quickstart connection information, like connection strings and samples:

emulator control panel

But more interestingly, I can also browse to a data explorer which allows me to browse databases in my Cosmos emulator, and collections within these databases using a SQL like language. Of course after I install the emulator, there are no databases or collections – but let’s start writing some .NET Core code to change that.

data explorer

Let’s write to, and read from, some Cosmos Databases and Collections with .NET Core

I’m going to write a very simple application to interact with the Cosmos Emulator. This isn’t production ready code – this is just to examine how we might carry out some common database operations using .NET Core and Azure Cosmos.

I’m using Visual Studio 2019 with the .NET Core 3.0 preview (3.0.100-preview-010184), and I’ve created an empty .NET Core Console application.

My sample application will be to store information about interesting places near me – so I’ve chosen to create a Cosmos database with the title “LocalLandmarks”. I’m going to create a collection in this database for natural landmarks, and in this first blog I’m only going to store the landmark name.

From my application, I need to install a NuGet package to access the Azure Cosmos libraries.

Install-Package Microsoft.Azure.DocumentDB.Core

First let’s set up some parameters and objects:

  • Our Cosmos Emulator endpoint is just https://localhost:8081;
  • We know from the Data Explorer that the emulator key is (this is the same for everyone that uses the emulator):
C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==
  • I’m going to call my database “LocalLandmarks”;
  • I’m going to call the collection of natural landmarks “NaturalSites”;
  • My POCO for natural landmarks can be very simple for now:
namespace CosmosEmulatorSample
{
    public class NaturalSite
    {
        public string Name { get; set; }
    }
}

So I can specify a few static readonly strings for my application:

private static readonly string CosmosEndpoint = "https://localhost:8081";
private static readonly string EmulatorKey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
private static readonly string DatabaseId = "LocalLandmarks";
private static readonly string NaturalSitesCollection = "NaturalSites";

We can create a client to connect to our Cosmos Emulator using our specified parameters and the code below:

// Create the client connection
var client = new DocumentClient(
    new Uri(CosmosEndpoint), 
    EmulatorKey, 
    new ConnectionPolicy
    {
        ConnectionMode = ConnectionMode.Direct,
        ConnectionProtocol = Protocol.Tcp
    });

And now using this client we can create our “LocalLandmarks” database.

I’ve used the “Result” method to make many of the asychronous functions into synchronous functions for simplicity in this introductory post.

// Create a new database in Cosmos
var databaseCreationResult = client.CreateDatabaseAsync(new Database { Id = DatabaseId }).Result;
Console.WriteLine("The database Id created is: " + databaseCreationResult.Resource.Id);

Within this database, we can also create a collection to store our natural landmarks.

// Now initialize a new collection for our objects to live inside
var collectionCreationResult = client.CreateDocumentCollectionAsync(
    UriFactory.CreateDatabaseUri(DatabaseId),
    new DocumentCollection { Id = NaturalSitesCollection }).Result;
 
Console.WriteLine("The collection created has the ID: " + collectionCreationResult.Resource.Id);

So let’s declare and initialize a NaturalSite object – an example of a natural landmark near me is the Giant’s Causeway.

// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Name = "Giant's Causeway" };

And I can pass this object to the Cosmos client’s “CreateDocumentAsync” method to write this to Cosmos, and I can specify the database and collection that I’m targeting in this method also.

// Add this POCO as a document in Cosmos to our natural site collection
var itemResult = client
    .CreateDocumentAsync(
        UriFactory.CreateDocumentCollectionUri(DatabaseId, NaturalSitesCollection), giantsCauseway)
    .Result;
 
Console.WriteLine("The document has been created with the ID: " + itemResult.Resource.Id);

At this point I could look at the Cosmos Emulator’s Data Explorer and see this in my database, as shown below:

cosmos with data

Finally I can read back from this NaturalSite collection by ID – I know the ID of the document I just created in Cosmos, so I can just call the Cosmos client’s “ReadDocumentAsync” method and specify the database Id, the collection I want to search in, and the document Id that I want to retrieve. I convert the results to a NaturalSite POCO, and then I can read properties back from it.

// Use the ID to retrieve the object we just created
var document = client
    .ReadDocumentAsync(
        UriFactory.CreateDocumentUri(DatabaseId, NaturalSitesCollection, itemResult.Resource.Id))
    .Result;
 
// Convert the document resource returned to a NaturalSite POCO
NaturalSite site = (dynamic)document.Resource;
 
Console.WriteLine("The returned document is a natural landmark with name: " + site.Name);

I’ve uploaded this code to GitHub here.

Wrapping up

In this post, I’ve written about the Azure Cosmos emulator which I’ve used to experiment with coding for Cosmos. I’ve written a little bit of very basic C# code which uses the Cosmos SDK to create databases and collections, write to these collections, and also read documents from collections by primary key. Of course this query might not be that useful – we probably don’t know the IDs of the documents saved to the database (and probably don’t care either as it’s non-semantic). In the next part of this series, I’ll write about querying Cosmos documents by object properties using .NET.


About me: I regularly post about Microsoft technologies and .NET – if you’re interested, please follow me on Twitter, or have a look at my previous posts here. Thanks!

.net core, C# tip, MVC

Adding middleware to your .NET Core MVC pipeline to prettify HTML output with AngleSharp

I was speaking to a friend of mine recently about development and server side generated HTML, and they said that one thing they would love to do is improve how HTML code is when it’s rendered. Often when they look at the HTML source of a page, the indentation is completely wrong, and there are huge amounts of whitespace and unexpected newlines.

And I agreed – I’ve seen that too. Sometimes I’ve been trying to debug an issue in the rendered output HTML, and one of the first things I do format and indent the HTML code so I can read and understand it. And why not – if my C# classes aren’t indented logically, I’d find it basically unreadable. Why should my HTML be any different?

So it occurred to me that I might be able to find a way to write some middleware for my .NET Core MVC website that formats and indents rendered HTML for me by default.

This post is just a fun little experiment for me – I don’t know if the code is performant, or if it scales. Certainly on a production site I might want to minimise the amount of whitespace in my HTML to improve download speeds rather than just change the formatting.

Formatting and Indenting HTML

I’ve seen a few posts asking how to do this with HtmlAgilityPack – but even though HtmlAgilityPack is amazing, it won’t format HTML.

I’ve also seen people recommend a .NET wrapper for the Tidy library, but I’m going to use AngleSharp. AngleSharp is a .NET library that allows us to parse HTML, and contains a super useful formatter called PrettyMarkupFormatter.

var parser = new AngleSharp.Html.Parser.HtmlParser();
var document = parser.ParseDocument("<html><body>Hello, world</body></html>");
 
var sw = new StringWriter();
document.ToHtml(swnew AngleSharp.Html.PrettyMarkupFormatter());
 
var indentedHtml = sw.ToString();

And I can encapsulate this in a function as below:

private static string PrettifyHtml(string newContent)
{
    var parser = new AngleSharp.Html.Parser.HtmlParser();
    var document = parser.ParseDocument(newContent);
 
    var sw = new StringWriter();
    document.ToHtml(swnew AngleSharp.Html.PrettyMarkupFormatter());
    return sw.ToString();
}

Adding middleware to modify the HTML output

There’s lots of information on writing ASP.NET Core middleware here and I can build on this and the AngleSharp code to re-format the rendered HTML. The code below allows me to:

  • Check I’m in my development environment,
  • Read the rendered HTML from the response,
  • Correct the indentation using AngleSharp and the new PrettifyHtml method, and
  • Write the formatted HTML back to the Response.
if (env.IsDevelopment())
{
    app.Use(async (contextnext=>
    {
        var body = context.Response.Body;
 
        using (var updatedBody = new MemoryStream())
        {
            context.Response.Body = updatedBody;
 
            await next();
 
            context.Response.Body = body;
 
            updatedBody.Seek(0SeekOrigin.Begin);
            var newContent = new StreamReader(updatedBody).ReadToEnd();
 
            await context.Response.WriteAsync(PrettifyHtml(newContent));
        }
    });
}

And now the HTML generated by my MVC application is formatted and indented correctly.

Wrapping up

This post is really just a proof of concept and for fun – I’ve restricted the effect to my development environment in case it doesn’t scale well. But hopefully this is useful to anyone trying to format HTML, or intercept an HTML response to modify it.


About me: I regularly post about Microsoft technologies and .NET – if you’re interested, please follow me on Twitter, or have a look at my previous posts here. Thanks!

Continue reading

.net, C# tip

Instantiating a C# object from a string using Activator.CreateInstance in .NET

Recently I hit an interesting programming challenge – I need to write a library which can instantiate and use a C# class object from a second C# assembly.

Sounds simple enough…but the catch is that I’m only given some string information about the class at runtime, such as the class name, its namespace, and what assembly it belongs to.

Fortunately this is possible using the Activator.CreateInstance method in C#. First I need to format the namespace, class name and assembly name in a special way – as an assembly qualified name.

Let’s look at an example – the second assembly is called “MyTestProject” and the object I need to instantiate from my library looks like the one below.

namespace SampleProject.Domain
{
    public class MyNewTestClass
    {
        public int Id { getset; }
 
        public string Name { getset; }
 
        public string DoSpecialThing()
        {
            return "My name is MyNewTestClass";
        }
    }
}

This leads to the assembly qualified name:

"SampleProject.Domain.MyNewTestClass, MyTestProject"

Note that the format here is along the lines of:

"{namespace}.{class name}, "{assembly name}"

Another way of finding this assembly qualified name is to run the code below:

Console.WriteLine(typeof(MyNewTestClass).AssemblyQualifiedName);

This will output something like:

SampleProject.Domain.MyNewTestClass, MyTestProject, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

There’s extra information about Version, Culture and PublicKeyToken – you might need to use this if you’re targetting different versions of a library, but in my simple example I don’t need this so I won’t elaborate further here.

Now I have the qualified name of the class, I can instantiate the object in my library using the Activator.CreateInstance, as shown below:

const string objectToInstantiate = "SampleProject.Domain.MyNewTestClass, MyTestProject";
 
var objectType = Type.GetType(objectToInstantiate);

var instantiatedObject = Activator.CreateInstance(objectType);

But it’s a bit difficult to do anything useful with the instantiated object – at runtime we obviously know it has the type of Object, and I suppose we could call the ToString() method, but for a new class that’s of limited use. How can we access properties and methods in the instantiated class?

One way is to use the dynamic keyword to manipulate the new object

We could use the dynamic keyword with the instantiated object, and set/get methods dynamically, like in the code below:

const string objectToInstantiate = "SampleProject.Domain.MyNewTestClass, MyTestProject";
 
var objectType = Type.GetType(objectToInstantiate);

dynamic instantiatedObject = Activator.CreateInstance(objectTypeas ITestClass;
 
// set a property value
instantiatedObject.Name = "Test Name";
 
// get a property value
string name = instantiatedObject.Name;
 
// call a method - this outputs "My name is MyNewTestClass"
Console.Write(instantiatedObject.DoSpecialThing());

Another way to manipulate the instantiated object is through using a shared interface

We could make the original object implement an interface shared across all projects. If I add the interface below to my project…

namespace SampleProject.Domain
{
    public interface ITestClass
    {
        int Id { getset; }
        string Name { getset; }
        string DoSpecialThing();
    }
}

…then our original object could implement this interface:

namespace SampleProject.Domain
{
    public class MyNewTestClass : ITestClass
    {
        public int Id { getset; }
 
        public string Name { getset; }
 
        public string DoSpecialThing()
        {
            return "My name is MyNewTestClass";
        }
    }
}

So if we happen to know at design time that the object we want to instantiate implements the ITestClass interface, we can access methods exposed by that interface – there’s no need to use the dynamic keyword now.

const string objectToInstantiate = "SampleProject.Domain.MyNewTestClass, MyTestProject";
 
var objectType = Type.GetType(objectToInstantiate);

var instantiatedObject = Activator.CreateInstance(objectTypeas ITestClass;
 
// set a property value
instantiatedObject.Name = "Test Name";
 
// get a property value
var name = instantiatedObject.Name;
 
// call a method - this outputs "My name is MyNewTestClass"
Console.Write(instantiatedObject.DoSpecialThing());

And of course if I have another domain object which implements the same interface but has different behaviour, like the one below…

namespace SampleProject.Domain
{
    public class DifferentTestClass : ITestClass
    {
        public int Id { getset; }
 
        public string Name { getset; }
 
        public string DoSpecialThing()
        {
            return "This is a different special thing";
        }
    }
}

..then I can use similar code to instantiate and manipulate the object – I just need to use the different object’s assembly qualified name:

const string objectToInstantiate = "SampleProject.Domain.DifferentTestClass, MyTestProject";
 
var objectType = Type.GetType(objectToInstantiate);

var instantiatedObject = Activator.CreateInstance(objectTypeas ITestClass;
 
// set a property value
instantiatedObject.Name = "Other Test Name";
 
// get a property value
string name = instantiatedObject.Name;
 
// call a method - but this now outputs "This is a different special thing"
Console.Write(instantiatedObject.DoSpecialThing());

Hopefully this helps anyone else facing a similar challenge – it’s worth bearing in mind that reflection is very powerful, but also can be a bit slower than other techniques.