A common scenario in data warehousing applications is knowing what source system records to update, what data needs to be loaded and which data rows can be skipped as nothing has changed since they were last loaded. Another possible scenario is the need to facilitate searching data that is encrypted using cell level encryption or storing application passwords inside the database.
Data Hashing can be used to solve this problem in SQL Server.
A hash is a number that is generated by reading the contents of a document or message. Different messages should generate different hash values, but the same message causes the algorithm to generate the same hash value.
The HashBytes function in SQL Server |
SQL Server has a built-in function called HashBytes to support data hashing. HashBytes ( ' Here is a sample along with the return values commented in the next line : Select HashBytes('MD2', 'Hello world!') --0x63503D3117AD33F941D20F57144ECE64 Select HashBytes('MD4', 'Hello world!') --0x0D7A9DB5A3BED4AE5738EE6D1909649C Select HashBytes('MD5', 'Hello world!') --0x86FB269D190D2C85F6E0468CECA42A20 Select HashBytes('SHA', 'Hello world!') --0xD3486AE9136E7856BC42212385EA797094475802 Select HashBytes('SHA1', 'Hello world!') --0xD3486AE9136E7856BC42212385EA797094475802 Select HashBytes('SHA2_256', 'Hello world!') --0xC0535E4BE2B79FFD93291305436BF889314E4A3FAEC05ECFFCBB7DF31AD9 E51A Select HashBytes('SHA2_512', 'Hello world!') --0xF6CDE2A0F819314CDDE55FC227D8D7DAE3D28CC556222A0A8AD66D91CCA D4AAD6094F517A2182360C9AACF6A3DC323162CB6FD8CDFFEDB0FE038F55E8 5FFB5B6 |
|
Properties of good hash functions |
A good hashing algorithm has these properties: · It is especially sensitive to small changes in the input. Minor changes to the document will generate a very different hash result. · It is computationally unfeasible to reverse. There will be absolutely no way to determine what changed in the input or to learn anything about the content of an input by examining hash values. For this reason, hashing is often called one-way hashing. · It is very efficient. |
|
Should you encrypt or hash? |
In the last 2 weeks, we described how to use cell-level encryption and TDE in SQL server. During application development, it might be useful to understand when to encrypt your data vs. when to hash it. The difference is that encrypted data can be decrypted, while hashed data cannot be decrypted. Another key difference is that encryption normally results in different results for the same text but hashing always produces the same result for the same text. The deciding factor when choosing to encrypt or hash your data comes after you determine if you'll need to decrypt the data for offline processing. A typical example of data that needs to be decrypted would be within a payment processing system is a credit card number. Thus the credit card number should be encrypted in the payment processing system. However, in the case of security code for the credit card, hashing it is sufficient if only equality checks are done and the system does not need to know it’s real value. Encryption is a two way process but hashing is unidirectional |
How to use hash bytes for indexing encrypted data. |
Encryption introduces randomization and in there is no way to predict the outcome of an encryption built-in. Does that mean creating an index on top of encrypted data is not possible? However, data hashing can come to your rescue. Refer to this blog post to learn how. |
|
Which hash function should I choose? |
Although, most hashing functions are fast, the performance of a hashing function depends on the data to be hashed and the algorithm used. There is no magic bullet. For security purposes, it is advised to use the strongest hash function (SHA2_512). However, you can choose other hashing algorithms depending on your workload and data to hash. |
No comments:
Post a Comment