Saving disk space by choosing the correct data type
"Database is almost 4 TB in size, user is complaining on long times for DEV data refresh and high storage costs for database and backups. Full backups are taking a long time to complete, even with compression enabled.
After a quick check, we discover tables that stores data from 15 years ago, without any purge or historical archive process implemented, most of the data stored is only with logging purposes, using incorrect data types and storing a lot of nulls. After changing data types and implementing data purge processes, database size was reduced to almost 90 GB, drastically improving their application performance and reducing storage costs and refresh times."
- true story... or not -
Designing our databases it is something that must be done carefully and by knowing how our application will work and choosing the right data types, specially for databases that will contain high volumes of data, can drastically improve our application response times and storage required.
We will compare storage usage for the most common data types. For displaying the table size I'm using my database dashboard report.
Always analyze if you really need to store times, or if you need to store time, the precision you need for your application to work correctly.
We create different tables with each datatype we want to test: datetime, smalldatetime, datetime2, datetime2(0), date.
Then we proceed to insert the same info on each table (500,000 rows to have a good sample space)
After the insert ends, we can see the different storage usages, as well as the data that is stored on each table.
|Datetime needs around 20% more space than Date
We need to check the possible maximum length of the data to be stored, if we require to store or not unicode characters, and if we require a fixed or variable length.
We create different tables with each datatype we want to test: Char, VarChar, NChar, NVarChar.
Again we insert same info on each table (500,000 rows also).
We can determine what datatype use more storage space, we can determine that storing unicode vs non-unicode characters make a big difference, so this can be one of the first things to check at design level.
|A good datatype for strings could make a big difference!
|'X' are too mainstream, besides, everybody loves '$'
For this datatype the requirements are more straightforward, so the only thing I do is to show you the different storage usages.
Again we create one table for each data type to check (since for numbers there are a lot of parameters and various subtypes, we will only check the most common with the defaults): Int, Decimal, Numeric, Money (exact types) and Real, Float (approximate types).
We insert same info on each table (500,000 rows also).
And again, with the default parameters for each data type, we can see the storage used by each one, so is always important to know your data, investigate the numbers to be stored in your database so you can optimize space usage.
|By default, numeric and decimal datatypes use the most space, this can change based on the parameters specified.
|Note that decimal and numeric defaults have no decimal digits
and values were approximated.
Make your own adjustments for specific tests
You can check the different data-types SQL Server offers, with their storage requirements as well their limitations, link is below, so you have a better understanding on how each one works, this will help you to design more efficient applications and reduce your future administration time, or maybe you can optimize databases already in place by just adjusting column types :)
Please note that these are only general guidelines, also I have used test data, always make sure to perform your own tests at development stage and make your own comparisons.