.net core, Azure

Solving some strange behaviour on Azure App Service with a URL containing non-ASCII characters

Recently I hit a strange problem after deploying an application to my Azure App Service – I saw errors appearing occasionally with some HTTP GET requests for URLs with some unexpected escape characters.

After digging into this a bit more, I found the solution – in this post, I’ve created a very simple scenario to demonstrate recreating the issue, and then I describe how I solved the problem.

Let’s say I’m working on a C# MVC site which gives some information about different countries, and I’d like to display the page’s URL in a friendly format as shown below:

/Home/Detail/{Country Name}

e.g. running through the ISO 3166-1 list of countries
/Home/Detail/Afghanistan
/Home/Detail/Åland Islands
/Home/Detail/Albania
...etc.

So I might want to validate the country name passed into the query string against a known list of countries (e.g. ISO 3166-1), and query a data source for the information I want to display.

And redirecting to a friendly URL from within a controller is a pretty ordinary thing to do in MVC – I might have carried out some other actions in an HTTP Post request, or maybe just redirect as shown below.

code

As an example, the image below shows my locally hosted web application, with the URL in the format I want, and a title on the page showing some information about Canada (which I could have retrieved from a database, but in this example I just display the country name from the query string).

canada image

I know I shouldn’t be displaying data on the page straight from the query string – I’m only doing it in this example to make the demonstration clearer. Don’t do this in your application.

So far, so trivial.

And if I increase the complexity a little bit with a non-ASCII character in the country name – like the “Åland Islands”, which is second in the list of countries – it all still works locally. The space between the two words becomes percent-encoded (%20) but that makes sense to me.

code

Now let’s deploy this application to a Azure App Service.

This is where it all goes wrong

Because if we browse to our App Service website with a country that has a non-ASCII character…it displays completely differently!

code

The unicode character hasn’t become percent encoded, it’s actually rendered as two other non-ASCII characters. So my nice, readable URL has become confusing, and my application can’t easily compare the country name in the query string to the list of ISO 3166-1 country names. Now that I’ve deployed to my cloud platform, I’ve got a problem I didn’t encounter locally.

The character “Å” has now been rendered as two separate UTF-8 characters – you can read more about this character here: https://en.wikipedia.org/wiki/%C3%85

Interesting…what about if I browse directly to the page using “Åland Islands” in the query string?

…it works as expected. Stranger and stranger.

code

So in summary, I’ve found that on an Azure App Service, if I try to redirect to a page with a URL containing non-ASCII characters, the resulting URL which is actually browsed to will be encoded differently to how I expect.

For what it’s worth, I think this isn’t a bug. It’s worth checking out the standards for more information about what is and isn’t allowed in a URI/URL:

URIs: https://tools.ietf.org/html/rfc3986

URLs: http://www.faqs.org/rfcs/rfc1738.html

What’s the solution?

After a lot of digging, I found there’s a really simple solution to this in .NET Core’s System library – Uri.EscapeDataString(string) – this converts characters, except for RFC2396 unreserved characters, to their hexidecimal representation.

I can change my code’s redirect to be this:

LocalRedirect($"/Home/Detail/{Uri.EscapeDataString(validatedCountryName)}");

And when the code below runs…

Uri.EscapeDataString("Åland Islands")

it is percent-encoded to the UTF-8 form to become:

%C3%85land%20Islands

This isn’t really human readable right now, but helpfully my browser reconverts it back into a readable string in the URL, and it’s rendered recognisably by my page.

code

So a small code change to escape non-ASCII characters using URI.EscapeDataString fixes my application’s function – my URL is readable again, and can be easily compared to a list of ISO 3166-1 countries.

Wrapping up

So this post might seem to be about how to correctly encode strings, but really I think it’s about making sure that you remember to test with character sets that you’re not familiar with, and also testing on an environment which is as similar as possible to your production environment. There’s always a way to make your application available to people who might not use the same alphabet as you.

One thought on “Solving some strange behaviour on Azure App Service with a URL containing non-ASCII characters

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s