.net, .net core, Flurl, Polly

Using Polly and Flurl to improve your website

This post is about how to use The Polly Project to make a .NET website better. I use Flurl to consume Restful web services so I’ve some Flurl specific code later on, but I hope this post is useful to anyone who’s interested in learning what Polly is, what it’s for and how it can help you.

So here’s a problem

Let’s pretend you run your business through a website, and part of your code calls out to a web service that another company supplies.

And, every once in a while, errors from this web service appear in your logs. Sometimes the HTTP status code is a 404 (not found), sometimes the code is a 503 (service unavailable), and other times you see a 504 (timeout). There’s no pattern, it goes away as quickly as it starts, and you’d really really like to get this fixed before customers start cancelling their subscriptions to your service.

You call up the business running the remote web service, and their answer is a bit… vague. Every so often they restart their web servers which takes their service down for a couple of seconds, and at certain times of the day they get spikes of traffic which causes their system to max out for up to 5 seconds at a time. They’re apologetic, and they expect to migrate to new, better infrastructure in about 6 months. But their only workaround is for you to re-query the service.

So you could be forgiven for going spare right now – this response doesn’t fix anything. This company is the only place you can get the data you need so you’re locked in. And you know your customers are seeing errors because it’s right there staring at you from your website logs. Asking your customers to ‘just hit refresh’ when they get an error is a great way to lose business and win a bad reputation.

You can use Polly to help solve this problem

When I first read about Polly a long while back, I was really interested but I wasn’t sure how I could apply it to the project I was working on.  What I wanted was to find a post that described a real world scenario that I could recognise and identify with, and how Polly would help with that.

Since then, I’ve worked on projects a little bit like the one I described above – one time when I’ve raised a ticket to say that we’re having intermittent problems with a web service, I’ve been told that the workaround is ‘hit refresh’. And since there’s a workaround, it’s only going to be raised as medium priority issue (which feels like a coded message for ‘we’re not even going to look at this’). This kind of thing drives me crazy and it’s exactly the kind of problem that Polly can at least mitigate.

I’ve also met people who are doing really interesting work with hardware devices in .NET, and need to be able to handle hardware that can only deal with single threads – Polly allows the application to handle occasions when it doesn’t receive an acknowledgement from the hardware by waiting for a while and then retrying.

Let’s get to some code

I’ve pushed all of the code below to a repo in my Github, so you pull it locally and step through it yourself.

First, a couple of harnesses to simulate a flakey web-service

So I’ve written a simple (and really awful) web-service project to simulate random transient errors. The service is just meant to return what day it is, but it’ll only work about two times out of three. The rest of the time it’ll return either a 404 (Not Found), a 503 (Service Unavailable), or it’ll hang for 10 seconds and then return a 504 (Service timed out).

using System;
using System.Diagnostics;
using System.Threading;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Newtonsoft.Json;
 
namespace WorldsWorstWebService.Controllers
{
    [Route("api/[controller]")]
    [ApiController]
    public class WeekDayController : ControllerBase
    {
        [HttpGet]
        public IActionResult Get()
        {
            // Manufacture 404, 503 and 504 errors for about a third of all responses
            var randomNumber = new Random();
            var randomInteger = randomNumber.Next(08);
 
            switch (randomInteger)
            {
                case 0:
                    Debug.WriteLine("Webservice:About to serve a 404...");
                    return StatusCode(StatusCodes.Status404NotFound);
 
                case 1:
                    Debug.WriteLine("Webservice:About to serve a 503...");
                    return StatusCode(StatusCodes.Status503ServiceUnavailable);
 
                case 2:
                    Debug.WriteLine("Webservice:Sleeping for 10 seconds then serving a 504...");
                    Thread.Sleep(10000);
                    Debug.WriteLine("Webservice:About to serve a 504...");
 
                    return StatusCode(StatusCodes.Status504GatewayTimeout);
                default:
                {
                    var formattedCustomObject = JsonConvert.SerializeObject(
                        new
                        {
                            WeekDay = DateTime.Today.DayOfWeek.ToString()
                        });
 
                    Debug.WriteLine("Webservice:About to correctly serve a 200 response");
 
                    return Ok(formattedCustomObject);
                }
            }
        }
    }
}

I’ve also written another web application project that consumes this service using Flurl.

If you’re interested in Flurl and Restful web services, I’ve written more about using it here.

using System.Diagnostics;
using System.Threading.Tasks;
using Flurl.Http;
using Microsoft.AspNetCore.Mvc;
using MyWebsite.Models;
 
namespace MyWebsite.Controllers
{
    public class HomeController : Controller
    {
        public async Task<IActionResult> Index()
        {
            try
            {
                var weekday = await "https://localhost:44357/api/weekday"
                    .GetJsonAsync<WeekdayModel>();
 
                Debug.WriteLine("[App]: successful");
 
                return View(weekday);
            }
            catch (Exception e)
            {
                Debug.WriteLine("[App]: Failed - " + e.Message);
                throw;
            }
        }
    }
}

So I carried out a simple experiment – run these projects and try to hit my website 20 times, I mostly get successful responses, but I still get a load of failures. I’ve pasted the debug log below.

[App]: successful
[App]: Failed - Call failed with status code 503 (Service Unavailable): GET https://localhost:44357/api/weekday
[App]: successful
[App]: successful
[App]: successful
[App]: Failed - Call failed with status code 504 (Gateway Timeout): GET https://localhost:44357/api/weekday
[App]: successful
[App]: successful
[App]: Failed - Call failed with status code 503 (Service Unavailable): GET https://localhost:44357/api/weekday
[App]: successful
[App]: successful
[App]: successful
[App]: successful
[App]: successful
[App]: successful
[App]: Failed - Call failed with status code 503 (Service Unavailable): GET https://localhost:44357/api/weekday
[App]: successful
[App]: Failed - Call failed with status code 503 (Service Unavailable): GET https://localhost:44357/api/weekday
[App]: successful
[App]: Failed - Call failed with status code 404 (Not Found): GET https://localhost:44357/api/weekday

So out of 20 page hits, my test web app failed 6 times – about a 30% failure rate. That’s pretty poor (and about consistent with what we expect from the flakey web service).

Let’s say I don’t control the behaviour of the web services upstream of my web app, so I can’t change reason why my web app is failing, but let’s see if Polly allows me to reduce the number of failures that my web app users see.

Wiring up Polly

First let’s design some rules, also known as ‘policies’

So what’s a ‘policy’? Basically it’s just a rule that’ll help mitigate the intermittent problem.

For example – the web service frequently delivers 404 and 503 messages, but it’s back up again quickly. So a policy could be:

Retry Policy: When the web services returns an unsuccessful HTTP code, wait a second and try again. If it still fails, wait three seconds and try again, and if it still fails, then wait five more seconds and try one more time. If it fails after that, the service is dead and we need to deal with the error.

We also know that the web service hangs for 10 seconds before delivering a 504 timeout message. I don’t want my customers to wait for this long – after a couple of seconds I’d like to my app to give up, and execute the ‘Retry Policy’ above.

Timeout Policy: When I’ve been waiting for a response for longer than 2 seconds, cut my losses and execute the Retry Policy.

Wrapping these policies together forms a ‘Policy Strategy’.

So the first step is to install the Polly nuget package to the web app project:

Install-Package Polly

Polly is an open source project hosted on Github, with a BSD licence. It’s also a member of the .NET Foundation,

So what would these policies look like in code? The timeout policy is like the code below, where we can just pass the number of seconds to wait as a parameter:

var timeoutPolicy = Policy.TimeoutAsync<HttpResponseMessage>(2);

There’s also an overload, and I’ve specified some debug messages using that below.

var timeoutPolicy = Policy.TimeoutAsync<HttpResponseMessage>(2, (context, timeSpan, task) =>
{
    Debug.WriteLine($"[App|Policy]: Timeout delegate fired after {timeSpan.Seconds} seconds");
    return Task.CompletedTask;
});

The retry policy is a little different from the timeout policy:

  • I first specify the conditions under which I should retry – there must be an unsuccessful HTTP status code, or there must be a timeout exception.
  • Then I can specify how to wait and retry – first wait 1 second before retrying, then wait 3 seconds, then wait 5 seconds.
  • Finally I’ve used the overload with a delegate to write comments to debug.
var retryPolicy = Policy
    .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .Or<TimeoutRejectedException>()
    .WaitAndRetryAsync(new[]
        {
            TimeSpan.FromSeconds(1),
            TimeSpan.FromSeconds(3),
            TimeSpan.FromSeconds(5)
        },
        (result, timeSpan, retryCount, context) =>
        {
            Debug.WriteLine($"[App|Policy]: Retry delegate fired, attempt {retryCount}");
        });

And I can bundle these policies together as a single policy strategy like this:

var policyStrategy = Policy.WrapAsync(RetryPolicy, TimeoutPolicy);

I’ve grouped these policies in their own class and pasted the code below.

public static class Policies
{
    private static TimeoutPolicy<HttpResponseMessage> TimeoutPolicy
    {
        get
        {
            return Policy.TimeoutAsync<HttpResponseMessage>(2, (context, timeSpan, task) =>
            {
                Debug.WriteLine($"[App|Policy]: Timeout delegate fired after {timeSpan.Seconds} seconds");
                return Task.CompletedTask;
            });
        }
    }
 
    private static RetryPolicy<HttpResponseMessage> RetryPolicy
    {
        get
        {
            return Policy
                .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
                .Or<TimeoutRejectedException>()
                .WaitAndRetryAsync(new[]
                    {
                        TimeSpan.FromSeconds(1),
                        TimeSpan.FromSeconds(2),
                        TimeSpan.FromSeconds(5)
                    },
                    (delegateResult, retryCount) =>
                    {
                        Debug.WriteLine(
                            $"[App|Policy]: Retry delegate fired, attempt {retryCount}");
                    });
        }
    }
 
    public static PolicyWrap<HttpResponseMessage> PolicyStrategy => Policy.WrapAsync(RetryPolicy, TimeoutPolicy);
}

Now I want to apply this Policy Strategy to every outgoing call to the 3rd party web service.

How do I apply these policies when I’m using Flurl?

One of the things I really like about using Flurl to consume 3rd party web services is that I don’t need to instantiate an HttpClient, or worry about running out of available sockets every time I make a call – Flurl handles all of this in the background for me.

But that also means it’s not immediately obvious how I can configure calls to the HttpClient used in the background so that my policy strategy is applied to each call.

Fortunately Flurl provides a way to do this by adding a few new classes to my web app project, and a configuration instruction. I can configure Flurl’s settings in my web app’s Startup file to make it use a different implementation of Flurl’s default HttpClientFactory (which overrides how HTTP messages are handled).

public void ConfigureServices(IServiceCollection services)
{
    //...other service configuration here
 
    FlurlHttp.Configure(settings => settings.HttpClientFactory = new PollyHttpClientFactory());
}

The PollyHttpClientFactory is an extension of Flurl’s default HttpClientFactory. This overrides how HttpMessages are handled, and instead uses our own PolicyHandler.

public class PollyHttpClientFactory : DefaultHttpClientFactory
{
    public override HttpMessageHandler CreateMessageHandler()
    {
        return new PolicyHandler
        {
            InnerHandler = base.CreateMessageHandler()
        };
    }
}

And the PolicyHandler is where we apply our rules (the policy strategy) to outgoing HTTP requests.

public class PolicyHandler : DelegatingHandler
{
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        return Policies.PolicyStrategy.ExecuteAsync(ct => base.SendAsync(request, ct), cancellationToken);
    }
}

Now let’s see if this improves things

With the policies applied to requests to the 3rd party web service, I repeated the earlier experiment and hit my application again 20 times.

[App]: successful
[App]: successful
[App|Policy]: Timeout delegate fired after 2000
[App|Policy]: Retry delegate fired, attempt 1
[App|Policy]: Timeout delegate fired after 2000
[App|Policy]: Retry delegate fired, attempt 2
[App]: successful
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App]: successful
[App]: successful
[App|Policy]: Timeout delegate fired after 2000
[App|Policy]: Retry delegate fired, attempt 1
[App]: successful
[App]: successful
[App]: successful
[App]: successful
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App]: successful
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App|Policy]: Retry delegate fired, attempt 2
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App|Policy]: Retry delegate fired, attempt 2
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App|Policy]: Retry delegate fired, attempt 2
[App]: successful
[App|Policy]: Retry delegate fired, attempt 1
[App|Policy]: Retry delegate fired, attempt 2
[App]: successful
[App]: successful
[App]: successful
[App]: successful

This time, my users would have experienced no application failures in those 20 page hits. But all those orange lines are the times that the web service failed, and our policy was to try again – which eventually lead to a successful response from my web app.

In fact, I went on to hit the page 100 times and only saw two errors in total, so the total failure rate that my users experience now is at about 2% – way better than the 30% failure rate experienced originally.

Obviously this is a very contrived example – real world examples are likely to be a bit more complex. And your rules and policies will be different to mine. Instead of retrying, maybe you want to fallback to a different action (e.g. hit a different web service, pull from a cache etc.) – and Polly has its own fallback mechanism to do this. You’ll have to design your own rules and policies to handle the particular failure modes that you face.

Wrapping up

I’d a couple of aims when writing this post – first of all I wanted to come up with a couple of different scenarios for how Polly could be used in your application. I mostly work with web applications and web services, and I also like using Flurl for accessing these services, so that’s what this article focusses on. But I’ve just scratched the surface here – Polly can do way more than that. Check out the Polly Wiki to find out more about it, or look at the samples.

 


About me: I regularly post about Microsoft technologies and .NET – if you’re interested, please follow me on Twitter, or have a look at my previous posts here. Thanks!

https://www.jerriepelser.com/blog/retry-network-requests-with-polly/

https://stackoverflow.com/questions/40745809/how-to-use-polly-with-flurl-http

https://stackoverflow.com/questions/52272374/set-a-default-polly-policy-with-flurl/52284010#52284010

5 thoughts on “Using Polly and Flurl to improve your website

  1. wow, been wondering about timeouts for the past few days, this post really helped me out and gives me a fresh perspective. Definitely worth trying out in my next project.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s