.Net developers always want queries to run faster. When you're dealing with really large data sets (think in terabytes of data) it's especially important to save time wherever you can. Partitioning, and the resulting partition elimination, can save hours of query time in these huge databases.
Partition elimination takes advantage of a few relatively simple tricks to save lots of query time at the expense of a little compile time. And if you compile while you're waiting on something else, like a network connection or some parallel process, it might not cost you any time at all.
This is the lesson learned at the recent Beantown .NET meeting. This month's meeting focused on Andrew Novick's presentation, “Partitioning SQL Server Tables, Views and Indexed Views”. Andrew Novick is a developer, consultant, and Microsoft certified SQL Server MVP. Novick explained how - and more importantly why - to use partitioning to improve large databases by reducing query time and simplifying data management.
According to Novick the first thing to do is to get as many disks as possible. When it comes to databases, many disks make for light work. But there's a limit to how many disks you can get your hands on, so when you've maxed those out you have to start looking for more creative solutions.
Partitioning the database
Partitioning is one way to go. You can use partitioning to maximize query performance (in terms of using all available resources) and streamline data management (making it easier to delete old data, migrate data to cheaper resources, and minimize maintenance tasks).
Novick detailed a sliding window technique, whereby the newest and therefore most malleable data stays at the forefront while older data (that still needs to be available but no longer needs to be edited) can be dropped into read-only files. Read-only files are a great alternative to keeping them in the window of data you're focused on or deleting them altogether.
The big advantage with the read-only files is that once you've created them, there's little need to revisit them for maintenance.
Assuming you backup and defragment them at creation, there's no need to go back and do either of those tasks during regular maintenance.
With temporal data -- the data that you're still adding to regularly -- indexes can get in the way of new insertions. But once the data is stabilized and will not change, you no longer have to worry about new insertions and so this is the time to add indexes. The indexes will help when it's time to compile queries.
At compile time, the query compiler can eliminate those partitions that don't matter, removing them from consideration in the plan. For example, if you partition your data by dates and you know a specific date for your query, the query might only have to run within one partition instead of the whole database. When the whole data base is measured in terabytes and each partition is only a few gigabytes, that difference can reduce a search from hours to seconds.
For more information on SQL Server check out SearchSQLServer.com.
This was first published in November 2010