DMS: An Introduction
Anatomy and how it works
The Database Migration Service (DMS) is an AWS service that enables us to migrate vast amounts of data from our source databases either as a one-time load or without ever incurring downtime via continuous replication.
Over the years DMS has continuously evolved to support a wide range of engines and also the capability to undertake migrations where the source and destination engine are different (heterogeneous).
DMS Supports the following source/destination engines
Source:
- Oracle
- MySQL
- Microsoft SQL Server
- MariaDB
- MongoDB
- Db2 LUW
- SAP
- PostgreSQL
- Amazon Aurora (MySQL & PostgreSQL)
Target:
- Oracle
- MySQL
- Microsoft SQL Server
- Amazon Aurora
- Amazon Redshift
- Amazon S3
- Amazon DynamoDB
- SAP ACE
Anatomy
Endpoints
Below are the key structural components of DMS. Naturally we start off with our endpoints, which when created must be defined as either source or target.
Along with this we simply configure our authentication information and optional connection attributes.
We can modify connection attributes in order to override particular settings within the DMS agent’s session, depending on your source/target database.
By default, DMS loads the tables in alphabetical order which isn’t always desirable depending on your database structure.
At Ubertas Consulting, we regularly encounter relationships between relational database tables which contain foreign key constraints and this can cause errors due to the way that DMS agents loads the tables. Due to this, we often update the connection attributes within the target endpoint to include the following:
stmt=SET FOREIGN_KEY_CHECKS=0
This ensures that we don’t received false-positive errors during the full-load.
Either side of our endpoints we have our source and target databases. Your source and destination databases don’t specifically need to be hosted within AWS, but must be supported by the DMS agent which runs on the replication instance.
Replication Instances
Next up we have our replication instances, which represent how we are actually charged for using DMS. AWS don’t charge for tasks or endpoints. The only costs that should be associated with carrying out a migration project outside of your replication instances would be the following services:
- CloudWatch — keep an eye on excessive storage charges if you ever need to enable severe logging within your tasks. This modification will result in SQL statements and other verbose information being sent to your CloudWatch log groups/streams.
- Data Transfer — depending on the location of your source/target databases in relation to your replication instances, charges for the ingress/egress of data can result in charges per GB.
Our replication instances are backed by AWS EC2 and much of the configuration is abstracted away from us.
The configuration options are limited to a subset of instance types (listed below), disk size (limited to gp2 volume type), VPC/Subnets and whether the instance publicly accessible. We can also make our replication instance support Multi-AZ, which means that it is deployed in a highly-available state so that in the event of an outage it can failover and prevent your critical migrations from being disrupted.
DMS Replication Instance Types:
- dms.t2.micro ~ dms.t2.large
- dms.c4.large ~ dms.c4.4xlarge
- dms.r4.large ~ dms.r4.8xlarge
Migration Tasks
Finally we have DMS Migration Tasks, which run on our replication instances and represent what exactly it is we are migrating and how we’re doing it. AWS recommend that we break our migration into multiple Migration Tasks, and this is also evident in the service limits which are imposed upon us.
We highly recommend that you spend plenty of time analysing your source database and planning Migration Tasks before diving into your migration. For example, if you have particularly large tables with large-object columns then we recommend creating separate tasks for these. This will allow your other tasks to progress faster without being blocked.
It’s especially important to split out your migration into multiple tasks so that should something go wrong, you are able to respond to failures with more agility.
Service limits:
- Replication Instances: 20
- Migration Tasks: 200
- Endpoints: 100
Console UI
AWS recently updated the design of the DMS in March 2020, here are some screenshots (April 2020). The main difference is that the previous sections listed below are now better organised into tabs.
Overview
Within this section we can view the metadata of our Migration Task, and there’s a helpful link to our CloudWatch logs. Logging within your DMS Migration Tasks is optional, but very useful for debugging if you ever experience connection or permission issues.
We can also view our task settings as JSON, which is intentionally there for you to view within the console because this is how we update our Migration Tasks. As of April 2020, we are still significantly limited to update we can make within the console and must the AWS CLI and pass JSON when the Migration Task is not in a running state.
Table statistics
The ability to view the status and progress of each table is in our opinion the most useful components within the Migration Task console view.
Below we can see there are 2 tables within the same schema named “auditing” — both have carried out an initial full-load, row validation has been completed and ongoing changes are now being captured using the source database’s binary logs.
CloudWatch metrics
Within the CloudWatch metrics section we are able to easily monitor a particular Migration Task metrics without having to go to CloudWatch directly.
However, we can still create custom dashboards within CloudWatch, for example if we would like to created aggregated views in order to summarise the overall progress of our database migration.
For a detailed explanation of the different CloudWatch metrics to DMS Migration Tasks, you can check out this AWS documentation page:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Monitoring.html
Mapping rules
This is an area that we eluded to in an earlier section of this article. Within DMS, we are able to use a strategy that involves breaking up our migration project into multiple Migration Tasks. We can update our table mapping either by using the Console UI, or via JSON.
As well as specifying particular schemas within the database to exclude/include, we can also apply filters to tables if for example we have large tables and want to migrate our data in smaller chunks.
For more information on Mapping rules, you can visit the relevant AWS documentation page here:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html
Within this article, we explained the basic anatomy of DMS and how it works.
In part 2, we will dive deeper into how to optimise your Migration Tasks for speed and agility.
