Sunday, November 27, 2011

BEST PRACTICES FOR USING SQL SERVER NETWORK PROTOCOL

WHICH IS THE BEST PROTOCOL TO CONNECT TO SQL SERVER?

I often get this question from the community and my answer to this question depends based on your network environment whether you are on a local computer with no networks or whether you are on a Local Area Network or whether you are on a Wide Area Network. I’ve tried to provide some guidance / recommendation below that works best for many SQL Server installations.


In order to connect to SQL Server, your client machine must have a network protocol enabled that can communicate with SQL Server database engine. By default, SQL Server can listen on 4 different protocols that come installed with SQL Server. However, not all the installed protocols will be enabled and you would need to enable them at the server using SQL Server Configuration Manager to be able to use them from your client machine.


THE FOUR NETWORK PROTOCOLS IN SQL SERVER ARE:



Protocol
Description
Shared Memory
This is the simplest protocol of all the other protocols available in SQL Server. The limitation is that the client applications must reside on the same machine where the SQL Server is installed.
Named Pipes
This protocol can be used when your application and SQL Server resides on a local area network. A part of memory is used by one process to pass information to another process, so that the output of one is the input of the other. The second process can be local (on the same computer as the first) or remote (on a networked computer).
TCP/IP
This is the most popular and common protocol widely used throughout the industry today. It communicates across interconnected networks and is a standard for routing network traffics and offers advanced security features.
VIA
(DEPRECATED)
DO NOT USE
Virtual Interface Adapter (VIA) protocol works with VIA hardware. This protocol has been deprecated and will no longer be available in the future versions of SQL Server.



BEST PRACTICES FOR USING SQL SERVER NETWORK PROTOCOLS



WHAT PROTOCOL TO USE ON A LOCAL SERVER?

If you are running your server application on a local computer which also runs SQL Server 2008 then Shared Memory or Named Pipes protocol is your best choice. However, Named Pipes get loaded in kernel mode of the operating system and are much faster on the local machine.

Note: This is only true if both the application and SQL Server are running on the same machine and if not then it would use Network Pipes that can be used over Local Area Networks.




WHAT PROTOCOL TO USE ON A LOCAL AREA NETWORK (LAN)?

As long as your Local Area Network is fast, both TCP/IP and Named Pipes would perform with the same speed. However, if there are more requests waiting to hit the server then Named Pipes may encounter pipe-busy error when trying to connect to SQL Server whereas, TCP/IP sockets doesn’t experience this problem as it supports  backlog queue for any number of requests.




WHAT PROTOCOL TO USE ON A WIDE AREA NETWORK (WAN)?

Data transmission using TCP/IP sockets is much more streamlined and has relatively less overhead compared to other network protocols. TCP/IP protocol can also take advantage of many performance features supported by the operating system that includes windowing, delayed acknowledgement which can be very helpful on slow networks.



Thanks,

Wednesday, November 9, 2011

TRIGGER-LESS TRACKING OF DML OPERATIONS

There are times when you need to track all the changes happening to a table. In the past, developers have used custom code to track changes by calling stored procedures, triggers and until recently even change data capture (also known as CDC).


SQL Server now supports a simplified method of auditing changes using DML statements without having to use Triggers or CDC (Change Data Capture). SQL Server introduces an OUTPUT clause as a part of DML statement that can help you in tracking changes made during any DML operations.


The OUTPUT clause can save the result-set in a table or table variable. The functionality is similar to what triggers had with INSERTED and DELETED tables which used to access the rows that have been modified during the DML operation.



Example:

In this example we will change the ITEMCOMMENT value from the ITEM table to the reverse of the original value and store the original and changed values in a table.



TRACKING CHANGES USING OUTPUT CLAUSE
--CREATE ITEM TABLE
CREATE TABLE ITEM (ITEMNAME VARCHAR(40), ITEMCOMMENT VARCHAR(255))

--INSERT DATA IN TO ITEM TABLE
-- PLEASE NOTICE THE MULTIPLE INSERTS WITH ONE INSERT STATEMENT
INSERT INTO ITEM VALUES
('MANGO','ELPPA NAHT RETTEB SI OGNAM'),
('PEACH','ETATS HCAEP A SI AIGROEG'),
('GRAPES','TAERG ERA SEPARG')

-- QUERY THE ITEM TABLE
SELECT * FROM ITEM

  
--DECLARE A TABLE VARIABLE TO STORE THE CHANGED VALUE
DECLARE @RECORDCHANGES TABLE(CHANGE VARCHAR(255))

--UPDATE THE ITEM TABLE TO FIX ITEM COMMENTS
--NOTICE THE USE OF OUTPUT PARAMETER BELOW.
--THE CHANGES ARE STORED IN THE TABLE VARIABLE

UPDATE ITEM
SET ITEMCOMMENT=REVERSE(ITEMCOMMENT)
OUTPUT 'ORIGINAL VALUE: [' + DELETED.ITEMCOMMENT+'] HAS BEEN CHANGED TO: ['+ INSERTED.ITEMCOMMENT+']' INTO @RECORDCHANGES

--QUERY THE CHANGES FROM THE TABLE VARIABLE
SELECT * FROM @RECORDCHANGES

-- RESULTSET FROM THE CHANGED TABLE

--QUERY THE ITEM TABLE
SELECT * FROM ITEM

--YOU WILL SEE THE BELOW RESULTSET



NOTE: OUTPUT CLAUSE IS NOT SUPPORTED FOR THE FOLLOWING STATEMENTS:

1.       DML statements that reference local partitioned views, distributed partitioned views, or remote tables.

2.       INSERT statements that contain an EXECUTE statement.

3.       Full-text predicates are not allowed in the OUTPUT clause when the database compatibility level is set to 100.

4.       The OUTPUT INTO clause cannot be used to insert into a view, or rowset function.

5.       A user-defined function cannot be created if it contains an OUTPUT INTO clause that has a table as its target.



Please refer to SQL Server BOL (Books Online) for detailed information on the Output clause.

Wednesday, November 2, 2011

UNDERSTANDING DATA HASHING TECHNIQUES IN SQL SERVER

 A common scenario in data warehousing applications is knowing what source system records to update, what data needs to be loaded and which data rows can be skipped as nothing has changed since they were last loaded. Another possible scenario is the need to facilitate searching data that is encrypted using cell level encryption or storing application passwords inside the database.



Data Hashing can be used to solve this problem in SQL Server.



A hash is a number that is generated by reading the contents of a document or message. Different messages should generate different hash values, but the same message causes the algorithm to generate the same hash value.





The HashBytes function in SQL Server
SQL Server has a built-in function called HashBytes to support data hashing.

HashBytes ( '', { @input | 'input' } )
::= MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512

Here is a sample along with the return values commented in the next line :
Select HashBytes('MD2', 'Hello world!')
--0x63503D3117AD33F941D20F57144ECE64

Select HashBytes('MD4', 'Hello world!')
--0x0D7A9DB5A3BED4AE5738EE6D1909649C

Select HashBytes('MD5', 'Hello world!')
--0x86FB269D190D2C85F6E0468CECA42A20

Select HashBytes('SHA', 'Hello world!')
--0xD3486AE9136E7856BC42212385EA797094475802

Select HashBytes('SHA1', 'Hello world!')
--0xD3486AE9136E7856BC42212385EA797094475802

Select HashBytes('SHA2_256', 'Hello world!')
--0xC0535E4BE2B79FFD93291305436BF889314E4A3FAEC05ECFFCBB7DF31AD9
E51A

Select HashBytes('SHA2_512', 'Hello world!')
--0xF6CDE2A0F819314CDDE55FC227D8D7DAE3D28CC556222A0A8AD66D91CCA
D4AAD6094F517A2182360C9AACF6A3DC323162CB6FD8CDFFEDB0FE038F55E8
5FFB5B6






Properties of good hash functions
A good hashing algorithm has these properties:

·         It is especially sensitive to small changes in the input. Minor changes to the document will generate a very different hash result.
·         It is computationally unfeasible to reverse. There will be absolutely no way to determine what changed in the input or to learn anything about the content of an input by examining hash values. For this reason, hashing is often called one-way hashing.
·         It is very efficient.






Should you encrypt or hash?
In the last 2 weeks, we described how to use cell-level encryption and TDE in SQL server. During application development, it might be useful to understand when to encrypt your data vs. when to hash it.
The difference is that encrypted data can be decrypted, while hashed data cannot be decrypted. Another key difference is that encryption normally results in different results for the same text but hashing always produces the same result for the same text. The deciding factor when choosing to encrypt or hash your data comes after you determine if you'll need to decrypt the data for offline processing.
A typical example of data that needs to be decrypted would be within a payment processing system is a credit card number. Thus the credit card number should be encrypted in the payment processing system. However, in the case of security code for the credit card, hashing it is sufficient if only equality checks are done and the system does not need to know it’s real value.
Encryption is a two way process but hashing is unidirectional



How to use hash bytes for indexing encrypted data.
Encryption introduces randomization and in there is no way to predict the outcome of an encryption built-in. Does that mean creating an index on top of encrypted data is not possible?

However, data hashing can come to your rescue. Refer to this blog post to learn how.




Which hash function should I choose?
Although, most hashing functions are fast, the performance of a hashing function depends on the data to be hashed and the algorithm used.

There is no magic bullet. For security purposes, it is advised to use the strongest hash function (SHA2_512). However, you can choose other hashing algorithms depending on your workload and data to hash.   





Hash functions or CHECK_SUM()?
SQL Server has the CHECK_SUM () (or BINARY_CHECKSUM ()) functions for generating the checksum value computed over a row of a table, or over a list of expressions.

One problem with the CHECK_SUM() (or BINARY_CHECKSUM()) functions is that the probability of a collision may not be sufficiently low for all applications (i.e. it is possible to come across examples of two different inputs hashing to the same output value). Of course, collisions are possible with any functions that have a larger domain than its range but because the CHECK_SUM function implements a simple XOR, the probability of this collision is high.

Try it out using the following example -
 DECLARE @guid1 UNIQUEIDENTIFIER ,@guid2 UNIQUEIDENTIFIER
SELECT @guid1 = ‘3DB7D309-A8F4-47C4-BA90-0CB458B44CB0′ , @guid2 = ‘EFE7F2C5-19F9-42B9-9C16-21BED41E882B’
SELECT chksum_guid1 = CHECKSUM(@guid1), chksum_guid2 = CHECKSUM(@guid2)
Hash functions provide stronger guarantees against collision.
The HashBytes function in SQL Server Denali provides data hashing and
supports the following algorithms :
MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512
 

Tuesday, August 30, 2011

SQL SERVER Fundaments II ( Stored Procedures Vs Functions)

In many instances you can accomplish the same task using either a stored procedure or a function. Both functions and stored procedures can be custom defined and part of any application. Functions, on the other hand, are designed to send their output to a query or T-SQL statement. For example, User Defined Functions (UDFs) can run an executable file from SQL SELECT or an action query, while Stored Procedures (SPROC) use EXECUTE or EXEC to run. Both are instantiated using CREATE FUNCTION.
To decide between using one of the two, keep in mind the fundamental difference between them: stored procedures are designed to return its output to the application. A UDF returns table variables, while a SPROC can't return a table variable although it can
Create a table. Another significant difference between them is that UDFs can't change the server environment or your operating system environment, while a SPROC can. Operationally, when T-SQL encounters an error the function stops, while T-SQL will ignore an error in a SPROC and proceed to the next statement in your code (provided you've included error handling support). You'll also find that although a SPROC can be used in an XML FOR clause, a UDF cannot be.
If you have an operation such as a query with a FROM clause that requires a rowset be drawn from a table or set of tables, then a function will be your appropriate choice. However, when you want to use that same rowset in your application the better choice would be a stored procedure.
There's quite a bit of debate about the performance benefits of UDFs vs. SPROCs. You might be tempted to believe that stored procedures add more overhead to your server than a UDF. Depending upon how your write your code and the type of data you're processing, this might not be the case. It's always a good idea to text your data in important or time-consuming operations by trying both types of methods on them

Monday, August 29, 2011

BEST PRACTICES FOR BACKING UP LARGE MISSION CRITICIAL DATABASES

In an ideal world, hard drives and other hardware never fail, software is never defective, users do not make mistakes, and hackers are never successful. However, we live in a less than perfect world and we should plan and prepare to handle adverse events.

In today’s topic, we will focus on best practices for backing up large mission critical databases. Performing and maintaining good backups is one of the top priority for any DBA/Developer/Engineer working with SQL Server.

KEEP IN MIND: BACKUP AND RESTORE IS NOT A HIGH AVAILABILITY FEATURE. YOU MUST PERFORM REGULAR BACKUPS OF YOUR DATABASES.

RESTORING a database from backup is simply a repair feature and not an availability feature. If you are running a mission-critical system and if your database requires high availability, then please look into various H/A features available with SQL Server.

If you are running a large/mission-critical database system than you need your database to be available continuously or for extended periods of time with minimal down-time for maintenance tasks. Therefore, the duration of situations that require databases to be restored must be kept as short as possible.

Additionally, if your databases are large, requiring longer periods of time to perform backup and restore than you MUST look into some of the cool new features that SQL Server offers to increase the speed of backup and restore operations to minimize the effect on users during both backup and restore operations.

USE MULTIPLE BACKUP DEVICES SIMULTANEOUSLY
If you are performing backups/restore on a large database than use multiple backup devices simultaneously to allow backups to be written to all the devices at the same time. Using multiple backup devices in SQL Server, allows database backups to be written to all devices in parallel. One of the potential bottleneck in backup throughput is the backup device speed. Using multiple backup devices can increase throughput in proportion to the number of devices used. Similarly, the backup can be restored from multiple devices in parallel.

USE MIRRORED MEDIA SET
Use a mirrored media set. A total of four mirrors is possible per media set. With the mirrored media set, the backup operation writes to multiple groups of backup devices. Each group of backup devices makes up a single mirror in the mirrored media set. Each single mirror set must use the same quantity and type of physical backup devices, and must all have the same properties.

USE SNAPSHOT BACKUPS (FASTEST BACKUP)
This is the fastest way to perform backups on databases. A snapshot backup is a specialized backup that is created almost instantaneously by using a split-mirror solution obtained from an independent hardware and software vendor. Snapshot backups minimize or eliminate the use of SQL Server resources to accomplish the backup. This is especially useful for moderate to very large databases in which availability is very important. Snapshot backups and restores can be performed sometimes in seconds with very little or zero effect on the server.

USE LOW PRIORITY BACKUP COMPRESSION
Backing up databases using the newly introduced backup compression feature, could increase CPU usage and any additional CPU consumed by the compression process can adversely impact concurrent operations. Therefore, when possible create a low priority compressed backup whose CPU usage is limited by Resource Governor to prevent any CPU contention.

USE FULL, DIFFERENTIAL AND LOG BACKUPS
If the database recovery model is set to FULL, than use different combination of backups (FULL, DIFFERENTIAL, LOG). This will help you minimize the number of backups that need to be applied to bring the database to the point of failure.

USE FILE/FILEGROUP BACKUPS
Use file and file group backups and T-log backups. These allow for only those files that contain the relevant data, instead of the whole database, to be backed up or restored.

USE A DIFFERENT DISK FOR BACKUPS
Do not use the same physical disk that holds database files or Log files for backup purposes. Using the same physical disk not only affects the performance, but also may reduce the recoverability of the plan.


Thanks,

Friday, August 19, 2011

SQL SERVER - Fundamentals- 1.

Today was taking an interveiw to fresher and worried about the sorry state on fundamentals. So inorder to provide some insight that would help Aspriants to understand better about the Fundamentals going ahead posting atleast 1 post per week with Fundamentals/Definations.. To start with.. here we go.

WHAT IS AN UNENFORCED RELATIONSHIP?
A link between tables that references the primary key in one table to a foreign key in another table, and which does not check the referential integrity during INSERT and UPDATE transactions


WHAT IS MANY TO MANY RELATIONSHIP?
A relationship between two tables in which rows in each table have multiple matching rows in the related table. For example, each sales invoice can contain multiple products, but each product can appear on multiple sales invoices.

WHAT IS A LINKED SERVER?
A definition of an OLE DB data source used by SQL Server distributed queries. The linked server definition specifies the OLE DB provider required to access the data, and includes enough addressing information for the OLE DB provider to connect to the data. Any rowsets exposed by the OLE DB data source can then be referenced as tables, called linked tables, in SQL Server distributed queries.

Wednesday, August 10, 2011

EFFICIENTLY MANAGE LARGE DATA MODIFICATIONS

Did you know that you can now use the TOP operator for Deleting, Inserting and Updating data in SQL Server tables?

Using the TOP operator for DML operation can help you in executing very large data operations by breaking the process into smaller pieces. This can potentially help with increased performance and also helps with improving database concurrency for larger and highly accessed tables. This is considered as one of the best techniques for managing data modifications on large data loads for reporting or data warehouse applications.

When you perform an update on large number of records using single set updates, it can cause the Transaction Log to grow considerably. However, when processing the same operation in chunks or pieces, each chunk is committed after completion allowing SQL Server to potentially re-use the T-Log space. Another classic issue many of us have experienced is when you are performing very large data updates and you cancel the query for some reason, you would have to wait for a long time while the transaction completely rolls back.

With this technique you can perform data modifications in smaller chunks and you can continue with your updates more quickly. Also, chunking allows more concurrency against the modified table, allowing user queries to jump in, instead of waiting for several minutes for a large modifications to finish.

Let’s take an example of deleting 1000 rows of records in a chunk. Assume a table called LARGETABLE table that has millions of records and you want delete 1000 records in chunk:
DELETING LARGE TABLE IN CHUNKS

--CREATE A DEMO TABLE CALLED LARGETABLE
CREATE TABLE LARGETABLE (ID_COL INT IDENTITY(1,1), COL_A VARCHAR(10),COL_B VARCHAR(10))
GO

--INSERT THE DATA IN LARGETABLE.. NOTICE THE USE OF ‘GO 10000’
INSERT INTO LARGETABLE VALUES ('A','B')
GO 10000

--QUERY THE TABLE
SELECT COUNT(*) FROM LARGETABLE;

--PERFORM DELETION OF 1000 ROWS FROM LARGETABLE
WHILE (SELECT COUNT(*) FROM LARGETABLE) > 0
BEGIN
DELETE TOP (1000) FROM LARGETABLE
SELECT LTRIM(STR(COUNT(*)))+' RECORDS TO BE DELETED' FROM LARGETABLE --THIS IS JUST A COMMENT.
END


The above technique can also be used with INSERT and UPDATE commands.