SQL Server Business Intelligence: SQL Server Replication strategy

Friday, July 24, 2009

SQL Server Replication strategy

Push vs. pull: Configuring SQL Server replication

Microsoft SQL Server replication is an extremely powerful technique that is used to move data in real time -- or near real time -- between servers in either a single direction (from the Publisher to the Subscriber) or bi-directionally (from the Publisher to the Subscriber and the Subscriber to the Publisher).

To ensure SQL Server replication runs at peak efficiency, many important decisions need to be made when configuring the tool.

Push or pull?
One such decision is determining where to run the Distribution Agent (or Merge Agent if you are using the merge replication topology).

More on SQL Server replication



Simplify SQL Server replication

By default, the SQL Server job will be configured on and run by the SQL Agent service on the Distributor. This is called a push configuration since the SQL Agent is pushing changes from the Distributor to the Subscriber. If the Agent ran on the Subscriber, it would be called a pull subscription because data would be pulled from the Distributor to the Subscriber.

When choosing the placement of the Distribution or Merge Agent, there are several factors to consider.

If you have multiple Subscribers and a dedicated Distributor, you probably should use a push subscription. This will remove the CPU and memory load of running Distribution or Merge Agents from the Subscriber, leaving those resources to process changes and service user requests.

On the other hand, if your Publisher and Distributor are located on the same physical server (or virtual server if your SQL Servers are virtualized) then you will probably want to run the Distribution or Merge Agents on the Subscribers. This will remove the load from your Publisher server, which is likely the more critical server in the SQL Server replication topology.

While these placement recommendations hold true when all the servers are on the same LAN with a high-speed network between them, you may need to change your options if you move to replicating data over a WAN. This is especially true if you have a large snapshot that needs to be published to the Subscribers (snapshots are discussed in depth further below).

SQL Server replication allows you to control the number of threads running at any one time, which enables you to apply more than one transaction to your Subscriber at a time. There is a built in bottleneck that will seriously hamper your efforts, however, specifically when using SQL Server 2005 and below.

In these versions, the Distribution Agent makes only a single connection to the distribution database. As a result, regardless of the number of threads you run to push data to the Subscriber, the data can only be pushed as quickly as that single thread allows. You'll find that SQL Server 2008 was upgraded to allow multiple threads to run from the Distribution Agent to the distribution database.

Therefore, if SQL Server replication is run over a WAN, especially a slower WAN or a WAN with high latency and the Distribution Agent is run at the Subscriber, the speed that the Distribution Agent can download data from the distribution database will be limited.

Note: Always keep in mind that speed and latency are two different things when dealing with networks.

Instead, the Distribution Agent should be run on the Distributor. This would allow for a single thread to pull data from the Distributor while multiple threads commit those transactions on the Subscriber.

Snapshots
When dealing with a snapshot, there are a couple of tricks that you can use to greatly decrease the time needed to process it over a WAN .

In general, you will want to reverse the technique. If you attempt to process the snapshot over the WAN, you will see a major slow down. Furthermore, in extremely high load environments with a high-latency WAN connection, you could be processing your snapshot slower than your data is changing.

When dealing with a snapshot, there are a couple of tricks that you can use to greatly decrease the time needed to process it over a WAN. 

The best trick to overcome this is to configure a push subscription. This will allow for the fastest processing of data once the snapshot has been processed.

When you are configuring the publication, select a local drive path (not a network share), that exists on the Distributor and on all the Subscriptions to store the snapshot. Then start your Distribution Agent to create the snapshot.

When the snapshot is created and the Distribution Agent begins processing it, stop the Distribution Agent. Once the SQL Server Agent job has stopped, navigate to the folder that the snapshot is in. By default this is C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\repldata\unc, though the folder will vary depending on where you configured SQL Server to store the data.

Compress the files and folders in this folder with your favorite compression technique. Copy the compressed file to the Subscriber and decompress the data into a folder with the same path as the Distributor. Then edit the SQL Server Agent job and copy all the parameters in the SQL Agent job step.

On your Subscriber, open a command prompt and navigate to the COM folder and type "distrib.exe". Note that C:\Program Files\Microsoft SQL Server\100\COM is the default location for SQL Server 2008, and you can simply replace 100 with 90 or 80 for SQL Server 2005 and SQL Server 2000, respectively.

Paste the switches that you copied from the SQL Server Agent job step into the command window and press Enter. This will cause the Distribution Agent to temporarily run on the Subscriber, and it will process the snapshot from the copy that has been placed on the Subscriber.

After the snapshot has been processed and the agent begins its normal processing of data from the Distributor, you can close the window and start the Distribution Agent on the Distributor.

Note that you cannot run the Distribution Agent on two servers at a time. The one that starts second will shut itself down with an error.

SQL Server replication can be an extremely powerful tool, but without proper planning or specific knowledge, you will likely end up with a deployment that matches the horror stories most people have.

ABOUT THE AUTHOR:   
Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's over 175-million-user installation, one of the largest in the world. Denny's areas of expertise include system architecture, performance tuning, replication and troubleshooting. Denny uses these skills on a regular basis in his current role as a Sr. Database Administrator and Architect at Awareness Technologies. Denny currently holds several Microsoft certifications related to SQL Server and is a Microsoft MVP. He is a member of PASS and Quest Software's Association of SQL Server Experts and has written numerous technical articles on SQL Server management.
Check out his blog: SQL Server with Mr. Denny
Get Clipmarks - The easiest way to email text, images and videos you find on the web.
Sent with Clipmarks

No comments: