Today a lot of companies use heterogeneous information systems as corporate data warehouses. In view of this fact, it is very important to synchronize databases belong to different DBMS in the most smooth and efficient way. And under such condition, the most challenging task is to achieve a real time update of modified data as both the source and the target databases could be in use constantly.
A classic approach to synchronization of database is based on comprehensive scanning of the original and the target databases in order to locate the data that is being changed or modified. It contains the following steps:
- Insert every rows from the data source which are missing the destination
- Delete from the target database every rows that’s missing in the source
- Update/upgrade every available rows
One problem of this method is the extremely long synchronization process. For instance, if there are two databases –SQL Server and MySQL, both of which have at least more than one million rows. Suppose that performance of a straightforward synchronization process is 1000 rows per second. It is impossible to further improve it by deploying high speed database access components or even the techniques as this method are incapable of bulk manipulations over data. Every round of that kind of synchronization takes almost an hour which makes it impossible to run near to real time basis.
Another way of approaching this problem is by iterating only those rows which are already modified since the earlier run of synchronization. It is possible to implement based on these algorithm:
- During the first run, the program performs classical database synchronization as it is mentioned above
- In addition, the program generates triggers on insert, delete and even for updates of every table that’s being synchronized. Every triggers write info about the modified row into a special service table
- From the second run, the program retrieves info about the modified rows straight from the service table and also updates only those rows
The approach described above is known as incremental or trigger-based database synchronization. Now, let’s examine how incremental method can reduce the time for database synchronization based on the example given above. Even without using bulk rows processing incremental synchronization can be executed in near to real time scheduling. However, it is also possible to implement bulk copy technique because all modified data that is accumulated in one place is grouped by type of operation – insert, delete and update. This method of data replication gives over 10 times higher performance of the synchronization process.
Incremental or a trigger-based database synchronization method applies these requirements: –
- Access to synchronized databases having sufficient privileges to create triggers and tables
- Every synchronized tables should have a defined primary key
To evaluate the overall benefits of incremental synchronization you can use SQL Server – MySQL Sync tool.