.net core, Azure, Cosmos, NoSQL

Getting started with Azure Cosmos DB and .NET Core: Part #2 – string querying and ranged indexes

Last time I scratched the surface of creating databases and collections in Azure Cosmos using the emulator and some C# code written using .NET Core. This time I’m going to dig a bit deeper into how to query these databases and collections with C#, and show a few code snippets that I’m using to help remove cruft from my classes. I’m also going write a little about Indexing Policies and how to use them to do useful string comparison queries.

Initializing Databases and Collections

I use the DocumentClient object to create databases and collections, and previously I used the CreateDatabaseAsync and CreateDocumentCollectionAsync methods to create databases and document collections.

But after running my test project a few times it got a bit annoying to keep having to delete the database from my local Cosmos instance before running my code, or have the code throw an exception.

Fortunately I’ve discovered the Cosmos SDK has a nice solution for this – a couple of methods which are named CreateDatabaseIfNotExistsAsync and CreateDocumentCollectionIfNotExistsAsync.

string DatabaseId = "LocalLandmarks";
string NaturalSitesCollection = "NaturalSites";
 
var databaseUrl = UriFactory.CreateDatabaseUri(DatabaseId);
var collectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdNaturalSitesCollection);
 
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrlnew DocumentCollection { Id = NaturalSitesCollection }).Wait();

Now I can initialize my code repeatedly without having to tear down my database or handle exceptions.

What about querying by something more useful than the document resource ID?

Last time I wrote some code that took a POCO and inserted it as a document into the Cosmos emulator.

// Let's instantiate a POCO with a local landmark
var giantsCauseway = new NaturalSite { Name = "Giant's Causeway" };
 
// Add this POCO as a document in Cosmos to our natural site collection
var collectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdNaturalSitesCollection);
var itemResult = client.CreateDocumentAsync(collectionUrigiantsCauseway).Result;

Then I was able to query the database for that document using the document resource ID.

// Use the ID to retrieve the object we just created
var document = client
    .ReadDocumentAsync(
        UriFactory.CreateDocumentUri(DatabaseIdNaturalSitesCollectionitemResult.Resource.Id))
    .Result;

But that’s not really useful to me – I’d rather query by a property of the POCO. For example, I’d like to query by the Name property, perhaps with an object instantiation and method signature like the suggestion below:

// Instantiate with the DocumentClient and database identifier
var cosmosQueryFacade = new CosmosQueryFacade<NaturalSite>
{
    DocumentClient = client,
    DatabaseId = DatabaseId,
    CollectionId = NaturalSitesCollection
};
 
// Querying one collection
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name == "Giant's Causeway").Result;

There’s a really useful sample project available with the Cosmos emulator which provided some code that I’ve adapted – you can access it from the Quickstart screen in the Data Explorer (available at https://localhost:8081/_explorer/index.html after you start the emulator). The image below shows how I’ve accessed the sample, which is available by clicking on the “Download” button after selecting the .NET Core tab.

sampleapp

The code below shows a query facade class that I have created – I can instantiate the object with parameters like the Cosmos DocumentClient, and the database identifier.

I’m going to be enhancing this Facade over the next few posts in this series, including how to use the new version 3.0 of the Cosmos SDK which has recently entered public preview.

public class CosmosQueryFacade<Twhere T : class
{
    public string CollectionId { getset; }
 
    public string DatabaseId { getset; }
 
    public DocumentClient DocumentClient { getset; }
 
    public async Task<IEnumerable<T>> GetItemsAsync(Expression<Func<Tbool>> predicate)
    {
        var documentCollectionUrl = UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId);
 
        var query = DocumentClient.CreateDocumentQuery<T>(documentCollectionUrl)
            .Where(predicate)
            .AsDocumentQuery();
 
        var results = new List<T>();
 
        while (query.HasMoreResults)
        {
            results.AddRange(await query.ExecuteNextAsync<T>());
        }
 
        return results;
    }
}

This class lets me query when I know the full name of the site. But what happens if I want to do a different kind of query – instead of exact comparison, what about something like “StartsWith”?

// Querying using LINQ StartsWith  
var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;

If I run this, I get an error:

An invalid query has been specified with filters against path(s) 
that are not range-indexed. 
Consider adding allow scan header in the request.

What’s gone wrong? The clue is in the error message – I don’t have the right indexes applied to my collection.

Indexing Policies in Cosmos

From Wikipedia, an index is a data structure that improves the speed of data retrieval from a database. But as we’ve seen from the error above, in Cosmos it’s even more than this. Certain types of index won’t permit certain types of comparison operation, and when I tried to carry out that operation, by default I got an error (rather than just a slow response).

One of the really well publicised benefits of Cosmos is that documents added to collections in a Azure Cosmos database are automatically indexed. And whereas that’s extremely powerful and useful, it’s not magic – Cosmos can’t know what indexes match my specific business logic, and won’t add them.

There are three types of indexes in Cosmos:

  • Hash, used for:
    • Equality queries e.g. m => m.Name == “Giant’s Causeway”
  • Range, used for:
    • Equality queries,
    • Comparison within a range, e.g. m => m.Age > 5, or m => m.StartsWith(“Giant”)
    • Ordering e.g. OrderBy(m => m.Name)
  • Spatial – used for geo-spatial data – more on this in future posts.

So I’ve created a collection called “NaturalSites” in my Cosmos emulator, and added some data to it – but how can I find out what the present indexing policy is. That’s pretty straightforward – it’s all in the Data Explorer again. Go to the Explorer tab, expand the database to see its collections, and then click on the “Scale & settings” menu item – this will show you the indexing policy for the collection.

indexes

When I created the database and collection from C#, the indexing policy created by default is shown below:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Hash",
          "dataType": "String",
          "precision": 3
        }
      ]
    }
  ],
  "excludedPaths": []
}

I can see that in the list of indexes for my collection, the dataType of String has an index of Hash (I’ve highlighted this in red above). We know this index is good for equality comparisons, but as the error message from before suggests, we need this to be a Ranged index to be able to do more complex comparisons than just equality between two strings.

I can modify the index policy for the collection in C#, as shown below:

// Set up Uris to create database and collection
var databaseUri = UriFactory.CreateDatabaseUri(DatabaseId);
var constructedSiteCollectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseIdConstructedSitesCollection);
 
// Create the database
client.CreateDatabaseIfNotExistsAsync(new Database { Id = DatabaseId }).Wait();
 
// Create a document collection
var naturalSitesCollection = new DocumentCollection { Id = NaturalSitesCollection };
// Now create the policy to make strings a Ranged index
var indexingPolicy = new IndexingPolicy();
indexingPolicy.IncludedPaths.Add(new IncludedPath
{
    Path = "/*",
    Indexes = new Collection<Microsoft.Azure.Documents.Index>()
    {
        new RangeIndex(DataType.String) { Precision = -1 }
    }
});

// Now assign the policy to the document collection
naturalSitesCollection.IndexingPolicy = indexingPolicy;
 
// And finally create the document collection
client.CreateDocumentCollectionIfNotExistsAsync(databaseUrinaturalSitesCollection).Wait();

And now if I inspect the Data Explorer for this collection, the index policy created is shown below. As you can see from the section highlighted in red, the kind of index now used for comparing the dataType String is now a Range.

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        },
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        }
      ]
    }
  ],
  "excludedPaths": []
}

So when I run the code below to look for sites that start with “Giant”, the code now works and returns objects rather than throwing an exception.

var sites = cosmosQueryFacade.GetItemsAsync(m => m.Name.StartsWith("Giant")).Result;

There are many more indexing examples here if you’re interested.

Wrapping up

I’ve taken a small step beyond the previous part of this tutorial, and I’m now able to query for strings that exactly and partially match values in my Cosmos database. As usual I’ve uploaded my code to GitHub and you can pull the code from here. Next time I’m going to try to convert my code to the new version of the SDK, which is now in public preview.

https://www.c-sharpcorner.com/article/indexing-in-azure-cosmos-db/

https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-dotnet-samples#indexing-examples

2 thoughts on “Getting started with Azure Cosmos DB and .NET Core: Part #2 – string querying and ranged indexes

Comments are closed.