# README
migrationload
Migration Load is a simple tool that allows the user to perform back to back migrations a WANdisco Live Data Migrator (LDM) environment.
How to run
Download the binary to your source LDM node and run
$ migrationload -p /path/to/migrate -m migration-name
This will load data on the source cluster into /path/to/migrate
, then create a new scan only migration called migration-name
which will start once successfully created.
This will continue to run until told to stop, and will output to stdout in the terminal.
Options
There are a number of options that can be passed to migrationload
:
Long Operator | Short Operator | Default Value | Description |
---|---|---|---|
-path | -p | The parent path to load the source data into. | |
-migration | -m | The name to use for the migration. | |
-num | -n | 1 | The number of migrations to run side by side. |
-wait | -w | 1m | The time to wait between polling for migration completion. |
Flow of operations
An overview of how the tool works:
- The arguments passed to the tool will be parsed, and the internal configuration set appropriately.
- Set additional configuration items that are gathered from the system:
- A client to connect to the local HDFS filesystem
- The client to connect to the LDM API
- The target filesystem to migrate data to
NOTE: Currently, there is an assumption that there will only be a single filesystem
- Start the migration process
- Create the parent path on the source filesystem
- Generate random data in the parent path
- Create the migration with the Scan Only Migration and Auto Start options set
- After the wait time has passed, the tool will poll the LDM API to check whether the migration has completed or not
- If it has not completed, the tool will wait again
- If it has completed, the tool will:
- Delete the path on the target filesystem
- Remove the migration via the LDM API
- Repeat all of the steps to recreate the migration
Internal Default Options
Random File Generator
When the random data is generated, the following options are used:
Option | Value | Description |
---|---|---|
filesize | 10000000 | The maximum file size in bytes |
depth | 3 | How deep the directory tree should go |
width | 2 | The number of subdirectories per directory |
files | 15 | The number of files |
randomfanout | true | Whether to randomise the number of files per directory |
concurrent | 5 | How many files to make at a time |
Create Migration
When the migration is created via the LDM API, the following options are used:
Option | Value | Description |
---|---|---|
ScanOnlyMigration | true | Whether the migration is created to scan only, and not move on to by live afterwards |
AutoStart | true | Whether the migration should start automatically when the creation is successful |
Delete Request
When the LDM API call is made to delete the path on the target filesystem, the following option is used"
Option | Value | Description |
---|---|---|
Recursive | true | Whether to recursively delete the path |
Limitations
- At present the tool will work only with a HDFS source filesystems.
- If there are multiple target filesystems, it will use the first one that is returned by the LDM API.
To Do
Will be looking to work on the following:
- The use of a configuration file to store the configuration properties rather than using flags
- Allow user to choose which target to use when there are multiple target filesystems available
- Additional source filesystems beyond HDFS
- Add a subcommand
setup
to make sure that any configuration items are in place.- Include adding a service setup for
systemctl
so that it can be ran in the background - Include logrotate setup, so that it logs in to
/var/log
- Include adding a service setup for