One of the most common ‘gotchas’ in a development project is to forget about performance until there’s a problem. I’ve often heard people quote Knuth saying “premature optimisation is the root of all evil” – hinting that right now is too early to think about performance tuning.
Of course performance tuning and improvement is put off, and put off, and put off some more…until there’s a performance test in pre-production and everything fails. (That’s if you’re lucky – at least you’ve caught it before it goes to production. A lot of the time that’s the first place the issue is spotted).
I believe in making it work first before you make it work fast – but within that statement, there’s an implication that “working” and “working fast” are both necessary. Making it just work isn’t enough. And Knuth is being quoted out of context – the full quote is “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” (my emphasis). That’s small efficiencies, not big ones. He also says “In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering“. 12%!!
I’d like to share 3 tips that I’ve used to make a huge difference to the performance of a .NET application using Entity Framework. I’ve often heard people criticise Entity Framework as slow, but I’m staying out of the pointless endless religious arguments about whether it is or isn’t. All I can say is that from my experience, the performance bottleneck has never been Entity Framework’s fault – it’s either somewhere else, or the way that Entity Framework has been used.
This is nothing to do with Entity Framework – this is a change to the database, not the .NET code. Entity Framework generates SQL behind the scenes and sends this to the database for execution, and it has no idea whether this SQL is going to perform a hugely expensive full table scan, or whether it’s going to use indices cleverly to prevent having to search every row in the database.
For me, this is the first port of call when someone says an application accessing a database is slow. SQL Server has some great tools to help with this – you can use SQL Profiler to record a trace file of all the SQL queries hitting a database over a period of time, and then use this trace file in Database Engine Tuning Advisor to identify what indices that the engine thinks will make the biggest difference to your application.
I’ve seen amazing improvements result from this technique – 97% improvements are not uncommon. Again, it’s not really an Entity Framework tip, but it’s worth checking.
The “Select N+1” Problem
So again, not really an Entity Framework problem…yes, there’s a bit of a theme emerging here! This is something that’s common to a lot of ORMs.
Basically I think of the problem as being a side effect of “lazy loading”. For example, say your application queries a database about cars. Cars are represented by a “Car” POCO object, which contains a list of child objects of POCO type “Wheel”.
From your application, you might query by primary key for a car with registration plate “ABC 123”, which (hopefully) returns one object as the result. Then you call the “Wheels” method to get information about the car’s wheels.
If your database is logically normalised, you’ve probably made at least two queries here – the original one to get the car, and then another to get information about the wheels. If you then call a property from the “Wheel” object which makes up the list, you’ll probably make another database query to get that information.
This is actually a massive advantage of ORMs – you as a developer don’t have to do extra work to load in information about child objects, and the query only happens when the application asks for information about that object. It’s all abstracted away from you, and it’s called lazy-loading.
There’s nothing wrong or evil with lazy-loading. Like any tool, it has a place and there are opportunities to mis-use it. Where I’ve seen it misused most is in the scenario where a developer:
- returns an object from an Entity Framework call;
- closes the session (i.e. connection to the database);
- looks in the parent object for a child object, and gets an exception saying the session is closed;
The developer then does one of two things:
- The developer moves all the logic into the method where the session is open because lazy loading fixes all their problems. This leads to a big mess of code. At some point – always – this code is copied and pasted, usually into a loop, leading to loads and loads of database queries. Because SQL Server is brilliant, it’s probably done all of these queries in a few seconds, and no-one really notices until it’s deployed to production and hundreds of users try to do this all at once and the site collapses. (Ok this is over dramatic – your performance testing events will catch this. Because of course you’re doing performance testing before going to production, aren’t you. Aren’t you?)
- The better developer realises that moving all the code into one method is a bad idea, and even though lazy loading allows you to do this, it’s mis-using the technique. They read a few blogs, discover this thing called eager loading and write code like this:
var car = (from c in context.Cars.Include("Wheel") where c.RegistrationPlate == "ABC 123" select c).FirstOrDefault<Car>();
Entity Framework is smart enough to recognise what’s going on here – instead of doing a dumb query on the Car table, it joins to the Wheel table and sends one query out to get everything it needs for the Car and the Wheels.
So this is good – but in my career, almost every application has a much more complex relationship between object and database entities than just one simple parent and child. This leads to much more complex chains of queries.
One technique I’ve used successfully is to create a database view which includes everything needed for the application business method. I like using views because it gives me much more granualar control over exactly what the joins are between tables, and also what fields are returned from the database. It also simplifies the Entity Framework code. But the biggest advantage is that the view becomes an interface – a contract really – between the database and the code. So if you have a DB expert who tells you “Look, your performance issues are down to how your database is designed – I can fix this, but if I do it’ll probably break your application“, you’ll be able to respond “Well, we query the database through a view, so as long as you’re able to create a view that has the same columns and output, you can change the database without affecting us.”
Of course if you’re using a database view, that means you won’t be able to update objects using Entity Framework because a view is read-only…which kind of defeats the purpose of using an ORM. However, if you have someone demanding a fix for a slow site, it’s a lot less intrusive to create and index a view than it is to re-engineer the application.
Note: I’m not advocating this as a magic bullet – it’s just a technique that sometimes has its place.
This is an Entity Framework setting. If you’re using views – or you know that your Entity Framework call won’t need to update the database – you can get an extra performance boost by using the AsNoTracking keyword.
var cars = context.Cars.AsNoTracking().Where(c => c.Color == "Red");
This will give you a performance boost if you’re returning large volumes of data, but less so for smaller volumes. Your mileage may vary – but remember you need to be sure you aren’t updating the context to use this.
- Ignore the wisdom of the newsgroup posts that say “Entity Framework’s just slow, nothing you can do”;
- Instead, run SQL Server profiler on the database, and put the resulting trace file through SQL Server’s Database Engine Tuning Adviser to find indices that will improve the slowest queries;
- Analyse the code to identify the “Select N+1” problem – there almost always is one of these in the code somewhere. If you want to find it, turn off lazy loading and run your tests.
- If you’re returning large volumes of data into a read-only list, see if you can use AsNoTracking to squeeze a bit more performance from your application.