Categorygithub.com/jbuchbinder/migrator
modulepackage
0.1.10
Repository: https://github.com/jbuchbinder/migrator.git
Documentation: pkg.go.dev

# README

MIGRATOR

Build Status Go Report Card GoDoc

ETL / data migrator.

Parameters

ParameterTypeDefaultDescription
BatchSizeinteger1000Extractor: Number of rows polled from the source database at a time
DebugboolfalseShow additional debugging information
InsertBatchSizeinteger100Loader: Number of rows inserted per statement
OnlyPastboolfalseExtractor(timestamp): Only poll for timestamps in the past ( #1 )
SequentialReplaceboolfalseLoader: Use REPLACE instead of INSERT for sequentially extracted data.
SleepBetweenRunsinteger5Migrator: Seconds to sleep when no data has been found

Extractors

  • Sequential: Tracks status via a table's primary key to see whether or not the table entries have been migrated. Useful for RO data which is written in sequence and not updated.
  • Timestamp: Tracks status via a table's written timestamp column to determine whether table entries have been migrated from that point on.
  • Queue: Tracks status via a triggered table which contains indexed entries which need to be migrated. This requires modification of the source database to include Insert and Update triggers. Useful for all kinds of data, but needs modification to source database.

Tracking Table

CREATE TABLE `EtlTracking` (
	sourceDatabase		VARCHAR(100) DEFAULT '',
	sourceTable		VARCHAR(100) DEFAULT '',
	columnName		VARCHAR(100) DEFAULT '',
	sequentialPosition	BIGINT DEFAULT 0,
	timestampPosition	TIMESTAMP NULL DEFAULT NULL,
	lastRun			TIMESTAMP NULL DEFAULT NULL
);

RecordQueue Table

CREATE TABLE `MigratorRecordQueue` (
	sourceDatabase		VARCHAR(100) NOT NULL,
	sourceTable			VARCHAR(100) NOT NULL,
	pkColumn 			VARCHAR(100) NOT NULL,
	pkValue 			VARCHAR(100) NOT NULL,
	timestampUpdated 	TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

	KEY (sourceDatabase, sourceTable),
	KEY (timestampUpdated)
);

# Functions

BatchedInsert takes an array of SQL data rows and creates a series of batched inserts to insert the data into an existing sql.Tx (transaction) object.
BatchedQuery takes an array of SQL data rows and creates a series of batched queries to insert/replace the data into an existing sql.Tx (transaction) object.
BatchedRemove takes an array of SQL data rows and creates a series of DELETE FROM statements to remove the data in an existing sql.Tx (transaction) object.
BatchedReplace takes an array of SQL data rows and creates a series of batched replaces to replace the data into an existing sql.Tx (transaction) object.
CreateTrackingTable attempts to create the tracking table for the specified database connection.
FileExists reports whether the named file or directory exists.
GetTrackingStatus retrieves a TrackingStatus object from its underlying database table.
GetTrackingStatusSequential retrieves the sequentialPosition for a TrackingStatus from its underlying database table.
GetTrackingStatusTimestamp retrieves the timestampPosition for a TrackingStatus from its underlying database table.
MigratorStateFromString derives a migrator state from a string.
NullTimeFromTime creates a new NullTime instance with the specified time.
NullTimeNow creates a new NullTime instance representing the current time.
OpenQueue creates an instance of a FIFO queue.
ParseDSN parses the given go-sql-driver/mysql datasource name.
RemoveRecordQueueItem removes an item from the record queue.
SerializeNewTrackingStatus serializes a TrackingStatus object to its database table.
SerializeTrackingStatus serializes a copy of an actively modified TrackingStatus to its underlying database table.
SetLogger sets a logrus Logger object used by the migrator.
SetTrackingStatusSequential updates a TrackingStatus object's sequentialPosition in its underlying database table.
SetTrackingStatusTimestamp updates a TrackingStatus object's timestampPosition in its underlying database table.

# Constants

S_INVALID represents an invalid state.
S_NEW is the status of a new migrator instance.
S_PAUSED is the status of a migrator when a pause has been implemented.
S_RUNNING is the status of a migrator which has been initialized.
S_STARTING is the status of a migrator when a start has been requested.
S_STOPPED is the status of a migrator when it has been stopped.
S_PAUSING is the status of a migrator when a stop has been requested.
S_TERMINATED is the migrator instance not running due to intervention.

# Variables

DefaultBatchSize represents the default size of extracted batches.
DefaultLoader represents a default Loader instance.
DefaultTransformer by default does nothing -- the data is not transformed.
ExtractorMap is a map of Extractor functions which can be used to instantiate an Extractor based only on a string.
ExtractorQueue is an Extractor instance which uses a table which is triggered by INSERT or UPDATE to notify the extractor that it needs to replicate a row.
ExtractorSequential is an Extractor instance which uses the primary key sequence to determine which rows should be extracted from the source database table.
ExtractorTimestamp is an Extractor instance which uses a DATETIME/TIMESTAMP field to determine which rows to pull from the source database table.
ExtractorTimestampFallback is an Extractor instance which uses a DATETIME/TIMESTAMP field to determine which rows to pull from the source database table.
ParamBatchSize is the parameter used to specify general batch processing size for polling records from the database.
ParamDebug is the parameter used to enable basic debugging code in modules.
ParamInsertBatchSize is the parameter used by the default loader to batch queries.
ParamLowLevelDebug is the parameter used to enable lower level debugging code in modules.
ParamMethod is the parameter name which specifies the insert or update method being used by portions of the migrator.
ParamOnlyPast is the parameter for timestamp-based polling which only polls for timestamps in the past.
ParamSequentialReplace is the parameter for loading which uses REPLACE instead of INSERT for sequentially extracted data.
ParamSleepBetweenRuns is the parameter which defines the amount of time between runs in seconds.
ParamTableName is the parameter for an adjusted table name.
RecordQueueTable is the table name for the non-update field capable entries.
TableRenamerTransformer adjusts the table name of a destination table based on the "TableName" parameter passed.
TrackingTableName represents the name of the database table used to track TrackingStatus instances, and exists within the target database.
TransformerMap is a map of Transformer functions which can be used to instantiate a Transformer based only on a string.

# Structs

Iteration defines the individual sub-migrator configuration which replicates a single table.
Migrator represents an object which encompasses an entire end-to-end ETL process.
NullTime represents a time.Time object which can also represent a NULL DATETIME / TIMESTAMP value in MySQL.
PersistenceQueue is a wrapper around goque, a LevelDB instance wrapped around some usage code.
RecordQueue is the table definition for the tracking table which is used for timestamp updated tables which do not have a lastUpdated or equivalent field.
SQLRow represents a single row of SQL data with an action associated with it.
TableData represents identifying information and data for a table.
TrackingStatus is the table definition for the tracking table which maintains the ETL positioning.

# Type aliases

Extractor is a callback function type.
Loader is a callback function type.
No description provided by the author
Parameters represents a series of untyped parameters which are passed to Extractors, Transformers, and Loaders.
SQLUntypedRow represents a single row of SQL data which is not strongly typed to a structure.
Transformer is a callback function type which transforms an array of untyped information into another array of untyped information.