Database Snapshots and SSIS
Did you think SQL Server 2005 database snapshots were really cool, but wondered how useful they were in practice? Building a Data Warehouse, here's an ideal use for them:
In my current project we're extracting data for the data warehouse from several SQL Server 2005 source system databases. Since we're extracting from multiple tables from each database, and since those tables continue to be populated during our extract, there is a definite risk that the extracted data is out of sync due to the extracts from different tables happens at slightly different times.
Database snapshots to the rescue
By first creating a snapshot of the source system database, and extracting from that static snapshot (instead of the original, changing database), these synchronization issues are avoided!
Even better, while the underlying database might be modified during the extract, the overhead for creating and using the snapshot is very small - it's proportional to the number of source system table pages changed during the extract.
1. The first Execute SQL Task creates the snapshot. Since no data is copied, this is a virtually instantaneous operation. At this stage the source system Connection Manager still points to the original database.
2. The first Script Task switches the database ConnectionString from the original database to the newly created snapshot database. This avoids having to create and keep a second connection manager in sync with the original database.
3. The Sequence Container contains all the Data Flow tasks etc. needed to move the source system data (now guaranteed to be static during the extract) into the data warehouse. NB: the following precedence constraints are "On Completion", so even if the extract fails, the snapshot will be deleted.
4. The second Script Task switches the ConnectionString back to the original database, i.e. almost same code as 2 above, but change "Source_System_Snapshot_Extract" into "Source_System", and vice versa.
5. The final Execute SQL Task simply drops the snapshot, again an instantaneous operation.
Wrapping up
One thing to watch out for is when additional database files are created, either manually by the DBA or automatically due to the source system having a maximum file size set. The snapshot creation command must specify ALL database files, so either update the snapshot command manually when this happens, or extend the Script Tasks to automatically figure out the number and filenames required.
In summary, database snapshots and SSIS have proved to be a very useful combination for extracting consistent data into the data warehouse.
No comments:
Post a Comment