Considerations for dealing with big databases

Stress everywhere, user wants everything "Para ayer"
Photo by Bernard Goldbach


If you are like me, and have SQL Server instances with databases of Terabytes of data, also known as very large database or VLDB, and you have to perform any kind of task on a reasonable time window, while minimizing the outage and keep the user happy with overall performance of the instance.

I want to share you some tips to help to make your life easier with a variety of common tasks you can encounter.

 

Optimize disk space


Maybe the most common task to deal with VLDs, a real headache if you have been handover the server with no further info and you have to optimize the disk space.

SQL Files location

Be aware of the location of each data and log file for your instance, find out how much space is taking each database file and if they are splitted across several files/folders.
Next step is to order the datafiles following best practices, separate datafiles in different drives as follows (if possible):
  • System databases (except for tempdb).
  • Tempdb database.
  • User datafiles (.mdf and .ndf files).
  • User logfiles (.ldf files) 

 

Use multiple filegroups

Most of the databases start with only one default filegroup. As you database becomes bigger and bigger, you should consider splitting your database into multiple filegroups, you can move big historical or log tables into a different filegroup and even mark them as read only, or separate them by business rules so that way, more critical data can be backed up more often using filegroup backups (more on this later on).

Monitor File growth

Put in place regular database growth monitoring (how you achieve that is up to you), but a proper alarm in time can avoid you a unpleasant outage or a critical ticket.
You can set an alarm (email, SMS, IM) when certain threshold is reached, for example a disk having 80% space used, so you have enough response time to check what is going on.

When creating new datafiles, set the expected size upon creation, don't rely on the auto-growth feature, as is recommended to have it enabled as a safeguard, always expand your datafiles manually with the expected size.
Remove the percent autogrowth feature, 10% of 1 GB is not the same as 10% of 1 TB, always prefer fixed sizes.

Data compression

Another feature of SQL Server is to row and page compression to reduce the disk space, the only drawback is that you have to consume more system resources to do the decompression when data is accessed.
Page compression usually saves more disk space, but also have a higher resource consumption for compress and decompress data.
If your database has high resource consumption, maybe this is not for you.


Backup/restore


If you don't have an adequate configuration for your VLDBs, the backup and restore windows can be also an issue, especially if you have tight RTOs for critical systems.

Backup compression

Out there since 2008, this feature is not commonly used, because it is not active by default, this can save you a lot of disk space for backups and reduce the backup time.

Filegroup backup

As stated above, consider separate your data into multiple datafiles and filegroups, this way using the backup filegroup option, you can use any business logic you want to achieve that, so more critical data can be backed up more frequently than the whole set, this also reduces your RTOs for your databases.

Differential backup is your friend

A good backup strategy must consider using differential backups for big databases, since these backups only store the changes from last full backup checkpoint, the size of the file will become larger as the time of the full backup increases, so you have to find a balance between full/diff backups that better suits your needs.



High Availability solutions


Mandatory for critical databases, even when the configuration of those can be a little more difficult due to the size of the databases, in case of a failure this can save your life, is always faster just to move to your secondary database copy than perform a full restore of the backups.
Which high availability solution you use depends a lot on your budget and the system configuration, but anything is better that nothing at all.


Transaction log


A couple of times in my personal experience, sudden high TLog growths have filled the disks and the alarms start to fire everywhere.  None these times was funny at all!


Backup your transaction log regularly

The more frequently you backup your log, the smaller the files will be, due the fact you need all the transaction log backups from the last full/diff, you have also to find the balance between backup frequency/file size.

Simple recovery mode

If your database is big due to a lot of historical data used mainly as read only, why bother having log backups? this can save up space you anyway will never use.
If your database have historical and high transactional data, consider split it into separate databases and configure the reporting ones as simple recovery mode.

Avoid shrinking whenever possible

Database shrinking can lead to fragmentation, always use it with caution and only in exceptional situations.
If your log grows  way often, then you have a issue with a query or queries that needs to be optimized.


T-SQL performance


For your application performance and object maintenance, especially if you are a hard-coded database administrator, you really cannot do much about, when another team is in charge of the database programming part, what you can do is to generate reports and advise user on improvements that could benefit all together.

Check for unused database objects

You will be surprised on the number of unused tables/indexes old databases have, check regularly for unused indexes and tables (we will do a post on how to check this in a near future), this can be helpful to free up some space.

Optimize your T-SQL code

Bad programming techniques, missing indexes, not using search arguments, can lead to poor performance queries, in a big database can lead to queries, reports, processes that take a long time to complete, use additional transaction log space and also use additional tempdb space, all of these thinks are bad for a high response database, so always identify your top resource consuming queries and try to optimize them.
Also as stated above, separating your tables across partitions/filegroups/datafiles/databases can be helpful to overall performance.

Reorganize and rebuild indexes

If your database is small, this is not an issue, but for big databases, think twice before reorganizing/rebuilding an index, this can be a high resource/high time consuming task, and transaction log of your database, as the tempdb size can significantly increase in matter of minutes.
Avoid using maintenance plans for big databases and implement specially tuned index and statistics maintenance jobs, that can determine if the maintenance is really needed for a particular object.


Please be aware that all of the described above are suggestions with no special order, and can drastically vary from one environment to another, so always keep in mind this!

Comments

Popular posts from this blog

Checking Azure SQL DB service tier via T-SQL

Install Python on Windows

Quick tip: Zoom in Azure Data Studio