Categorygithub.com/aws/aws-cdk-go/awscdkgluealpha/v2

modulepackage

2.166.0-alpha.0

Repository: https://github.com/aws/aws-cdk-go.git

Documentation: pkg.go.dev

# README

AWS Glue Construct Library

---

The APIs of higher level constructs in this module are experimental and under active development. They are subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model and breaking changes will be announced in the release notes. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.

This module is part of the AWS Cloud Development Kit project.

Job

A Job encapsulates a script that connects to data sources, processes them, and then writes output to a data target.

There are 3 types of jobs supported by AWS Glue: Spark ETL, Spark Streaming, and Python Shell jobs.

The glue.JobExecutable allows you to specify the type of job, the language to use and the code assets required by the job.

glue.Code allows you to refer to the different code assets required by the job, either from an existing S3 location or from a local file path.

glue.ExecutionClass allows you to specify FLEX or STANDARD. FLEX is appropriate for non-urgent jobs such as pre-production jobs, testing, and one-time data loads.

Spark Jobs

These jobs run in an Apache Spark environment managed by AWS Glue.

ETL Jobs

An ETL job processes data in batches using Apache Spark.

var bucket bucket

glue.NewJob(this, jsii.String("ScalaSparkEtlJob"), &JobProps{
	Executable: glue.JobExecutable_ScalaEtl(&ScalaJobExecutableProps{
		GlueVersion: glue.GlueVersion_V4_0(),
		Script: glue.Code_FromBucket(bucket, jsii.String("src/com/example/HelloWorld.scala")),
		ClassName: jsii.String("com.example.HelloWorld"),
		ExtraJars: []code{
			glue.*code_*FromBucket(bucket, jsii.String("jars/HelloWorld.jar")),
		},
	}),
	WorkerType: glue.WorkerType_G_8X(),
	Description: jsii.String("an example Scala ETL job"),
})

Streaming Jobs

A Streaming job is similar to an ETL job, except that it performs ETL on data streams. It uses the Apache Spark Structured Streaming framework. Some Spark job features are not available to streaming ETL jobs.

glue.NewJob(this, jsii.String("PythonSparkStreamingJob"), &JobProps{
	Executable: glue.JobExecutable_PythonStreaming(&PythonSparkJobExecutableProps{
		GlueVersion: glue.GlueVersion_V4_0(),
		PythonVersion: glue.PythonVersion_THREE,
		Script: glue.Code_FromAsset(path.join(__dirname, jsii.String("job-script"), jsii.String("hello_world.py"))),
	}),
	Description: jsii.String("an example Python Streaming job"),
})

Python Shell Jobs

A Python shell job runs Python scripts as a shell and supports a Python version that depends on the AWS Glue version you are using. This can be used to schedule and run tasks that don't require an Apache Spark environment. Currently, three flavors are supported:

PythonVersion.TWO (2.7; EOL)
PythonVersion.THREE (3.6)
PythonVersion.THREE_NINE (3.9)

var bucket bucket

glue.NewJob(this, jsii.String("PythonShellJob"), &JobProps{
	Executable: glue.JobExecutable_PythonShell(&PythonShellExecutableProps{
		GlueVersion: glue.GlueVersion_V1_0(),
		PythonVersion: glue.PythonVersion_THREE,
		Script: glue.Code_FromBucket(bucket, jsii.String("script.py")),
	}),
	Description: jsii.String("an example Python Shell job"),
})

Ray Jobs

These jobs run in a Ray environment managed by AWS Glue.

glue.NewJob(this, jsii.String("RayJob"), &JobProps{
	Executable: glue.JobExecutable_PythonRay(&PythonRayExecutableProps{
		GlueVersion: glue.GlueVersion_V4_0(),
		PythonVersion: glue.PythonVersion_THREE_NINE,
		Runtime: glue.Runtime_RAY_TWO_FOUR(),
		Script: glue.Code_FromAsset(path.join(__dirname, jsii.String("job-script"), jsii.String("hello_world.py"))),
	}),
	WorkerType: glue.WorkerType_Z_2X(),
	WorkerCount: jsii.Number(2),
	Description: jsii.String("an example Ray job"),
})

Enable Spark UI

Enable Spark UI setting the sparkUI property.

glue.NewJob(this, jsii.String("EnableSparkUI"), &JobProps{
	JobName: jsii.String("EtlJobWithSparkUIPrefix"),
	SparkUI: &SparkUIProps{
		Enabled: jsii.Boolean(true),
	},
	Executable: glue.JobExecutable_PythonEtl(&PythonSparkJobExecutableProps{
		GlueVersion: glue.GlueVersion_V3_0(),
		PythonVersion: glue.PythonVersion_THREE,
		Script: glue.Code_FromAsset(path.join(__dirname, jsii.String("job-script"), jsii.String("hello_world.py"))),
	}),
})

The sparkUI property also allows the specification of an s3 bucket and a bucket prefix.

See documentation for more information on adding jobs in Glue.

Enable Job Run Queuing

AWS Glue job queuing monitors your account level quotas and limits. If quotas or limits are insufficient to start a Glue job run, AWS Glue will automatically queue the job and wait for limits to free up. Once limits become available, AWS Glue will retry the job run. Glue jobs will queue for limits like max concurrent job runs per account, max concurrent Data Processing Units (DPU), and resource unavailable due to IP address exhaustion in Amazon Virtual Private Cloud (Amazon VPC).

Enable job run queuing by setting the jobRunQueuingEnabled property to true.

glue.NewJob(this, jsii.String("EnableRunQueuing"), &JobProps{
	JobName: jsii.String("EtlJobWithRunQueuing"),
	Executable: glue.JobExecutable_PythonEtl(&PythonSparkJobExecutableProps{
		GlueVersion: glue.GlueVersion_V4_0(),
		PythonVersion: glue.PythonVersion_THREE,
		Script: glue.Code_FromAsset(path.join(__dirname, jsii.String("job-script"), jsii.String("hello_world.py"))),
	}),
	JobRunQueuingEnabled: jsii.Boolean(true),
})

Connection

A Connection allows Glue jobs, crawlers and development endpoints to access certain types of data stores. For example, to create a network connection to connect to a data source within a VPC:

var securityGroup securityGroup
var subnet subnet

glue.NewConnection(this, jsii.String("MyConnection"), &ConnectionProps{
	Type: glue.ConnectionType_NETWORK(),
	// The security groups granting AWS Glue inbound access to the data source within the VPC
	SecurityGroups: []iSecurityGroup{
		securityGroup,
	},
	// The VPC subnet which contains the data source
	Subnet: Subnet,
})

For RDS Connection by JDBC, it is recommended to manage credentials using AWS Secrets Manager. To use Secret, specify SECRET_ID in properties like the following code. Note that in this case, the subnet must have a route to the AWS Secrets Manager VPC endpoint or to the AWS Secrets Manager endpoint through a NAT gateway.

var securityGroup securityGroup
var subnet subnet
var db databaseCluster

glue.NewConnection(this, jsii.String("RdsConnection"), &ConnectionProps{
	Type: glue.ConnectionType_JDBC(),
	SecurityGroups: []iSecurityGroup{
		securityGroup,
	},
	Subnet: Subnet,
	Properties: map[string]*string{
		"JDBC_CONNECTION_URL": fmt.Sprintf("jdbc:mysql://%v/databasename", db.clusterEndpoint.socketAddress),
		"JDBC_ENFORCE_SSL": jsii.String("false"),
		"SECRET_ID": db.secret.secretName,
	},
})

If you need to use a connection type that doesn't exist as a static member on ConnectionType, you can instantiate a ConnectionType object, e.g: new glue.ConnectionType('NEW_TYPE').

See Adding a Connection to Your Data Store and Connection Structure documentation for more information on the supported data stores and their configurations.

SecurityConfiguration

A SecurityConfiguration is a set of security properties that can be used by AWS Glue to encrypt data at rest.

glue.NewSecurityConfiguration(this, jsii.String("MySecurityConfiguration"), &SecurityConfigurationProps{
	CloudWatchEncryption: &CloudWatchEncryption{
		Mode: glue.CloudWatchEncryptionMode_KMS,
	},
	JobBookmarksEncryption: &JobBookmarksEncryption{
		Mode: glue.JobBookmarksEncryptionMode_CLIENT_SIDE_KMS,
	},
	S3Encryption: &S3Encryption{
		Mode: glue.S3EncryptionMode_KMS,
	},
})

By default, a shared KMS key is created for use with the encryption configurations that require one. You can also supply your own key for each encryption config, for example, for CloudWatch encryption:

var key key

glue.NewSecurityConfiguration(this, jsii.String("MySecurityConfiguration"), &SecurityConfigurationProps{
	CloudWatchEncryption: &CloudWatchEncryption{
		Mode: glue.CloudWatchEncryptionMode_KMS,
		KmsKey: key,
	},
})

See documentation for more info for Glue encrypting data written by Crawlers, Jobs, and Development Endpoints.

Database

A Database is a logical grouping of Tables in the Glue Catalog.

glue.NewDatabase(this, jsii.String("MyDatabase"), &DatabaseProps{
	DatabaseName: jsii.String("my_database"),
	Description: jsii.String("my_database_description"),
})

Table

A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.):

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
		&column{
			Name: jsii.String("col2"),
			Type: glue.Schema_Array(glue.Schema_STRING()),
			Comment: jsii.String("col2 is an array of strings"),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

By default, a S3 bucket will be created to store the table's data but you can manually pass the bucket and s3Prefix:

var myBucket bucket
var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Bucket: myBucket,
	S3Prefix: jsii.String("my-table/"),
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Glue tables can be configured to contain user-defined properties, to describe the physical storage of table data, through the storageParameters property:

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	StorageParameters: []storageParameter{
		glue.*storageParameter_SkipHeaderLineCount(jsii.Number(1)),
		glue.*storageParameter_CompressionType(glue.CompressionType_GZIP),
		glue.*storageParameter_Custom(jsii.String("separatorChar"), jsii.String(",")),
	},
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Glue tables can also be configured to contain user-defined table properties through the parameters property:

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Parameters: map[string]*string{
		"key1": jsii.String("val1"),
		"key2": jsii.String("val2"),
	},
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Partition Keys

To improve query performance, a table can specify partitionKeys on which data is stored and queried separately. For example, you might partition a table by year and month to optimize queries based on a time window:

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	PartitionKeys: []*column{
		&column{
			Name: jsii.String("year"),
			Type: glue.Schema_SMALL_INT(),
		},
		&column{
			Name: jsii.String("month"),
			Type: glue.Schema_SMALL_INT(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Partition Indexes

Another way to improve query performance is to specify partition indexes. If no partition indexes are present on the table, AWS Glue loads all partitions of the table and filters the loaded partitions using the query expression. The query takes more time to run as the number of partitions increase. With an index, the query will try to fetch a subset of the partitions instead of loading all partitions of the table.

The keys of a partition index must be a subset of the partition keys of the table. You can have a maximum of 3 partition indexes per table. To specify a partition index, you can use the partitionIndexes property:

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	PartitionKeys: []*column{
		&column{
			Name: jsii.String("year"),
			Type: glue.Schema_SMALL_INT(),
		},
		&column{
			Name: jsii.String("month"),
			Type: glue.Schema_SMALL_INT(),
		},
	},
	PartitionIndexes: []partitionIndex{
		&partitionIndex{
			IndexName: jsii.String("my-index"),
			 // optional
			KeyNames: []*string{
				jsii.String("year"),
			},
		},
	},
	 // supply up to 3 indexes
	DataFormat: glue.DataFormat_JSON(),
})

Alternatively, you can call the addPartitionIndex() function on a table:

var myTable table

myTable.AddPartitionIndex(&PartitionIndex{
	IndexName: jsii.String("my-index"),
	KeyNames: []*string{
		jsii.String("year"),
	},
})

Partition Filtering

If you have a table with a large number of partitions that grows over time, consider using AWS Glue partition indexing and filtering.

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	PartitionKeys: []*column{
		&column{
			Name: jsii.String("year"),
			Type: glue.Schema_SMALL_INT(),
		},
		&column{
			Name: jsii.String("month"),
			Type: glue.Schema_SMALL_INT(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
	EnablePartitionFiltering: jsii.Boolean(true),
})

Glue Connections

Glue connections allow external data connections to third party databases and data warehouses. However, these connections can also be assigned to Glue Tables, allowing you to query external data sources using the Glue Data Catalog.

Whereas S3Table will point to (and if needed, create) a bucket to store the tables' data, ExternalTable will point to an existing table in a data source. For example, to create a table in Glue that points to a table in Redshift:

var myConnection connection
var myDatabase database

glue.NewExternalTable(this, jsii.String("MyTable"), &ExternalTableProps{
	Connection: myConnection,
	ExternalDataLocation: jsii.String("default_db_public_example"),
	 // A table in Redshift
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Encryption

You can enable encryption on a Table's data:

S3Managed - (default) Server side encryption (SSE-S3) with an Amazon S3-managed key.

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_S3_MANAGED,
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Kms - Server-side encryption (SSE-KMS) with an AWS KMS Key managed by the account owner.

var myDatabase database

// KMS key is created automatically
// KMS key is created automatically
glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_KMS,
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

// with an explicit KMS key
// with an explicit KMS key
glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_KMS,
	EncryptionKey: kms.NewKey(this, jsii.String("MyKey")),
	// ...
	Database: myDatabase,
	Columns: []*column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

KmsManaged - Server-side encryption (SSE-KMS), like Kms, except with an AWS KMS Key managed by the AWS Key Management Service.

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_KMS_MANAGED,
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

ClientSideKms - Client-side encryption (CSE-KMS) with an AWS KMS Key managed by the account owner.

var myDatabase database

// KMS key is created automatically
// KMS key is created automatically
glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_CLIENT_SIDE_KMS,
	// ...
	Database: myDatabase,
	Columns: []column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

// with an explicit KMS key
// with an explicit KMS key
glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Encryption: glue.TableEncryption_CLIENT_SIDE_KMS,
	EncryptionKey: kms.NewKey(this, jsii.String("MyKey")),
	// ...
	Database: myDatabase,
	Columns: []*column{
		&column{
			Name: jsii.String("col1"),
			Type: glue.Schema_STRING(),
		},
	},
	DataFormat: glue.DataFormat_JSON(),
})

Note: you cannot provide a Bucket when creating the S3Table if you wish to use server-side encryption (KMS, KMS_MANAGED or S3_MANAGED).

Types

A table's schema is a collection of columns, each of which have a name and a type. Types are recursive structures, consisting of primitive and complex types:

var myDatabase database

glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{
	Columns: []column{
		&column{
			Name: jsii.String("primitive_column"),
			Type: glue.Schema_STRING(),
		},
		&column{
			Name: jsii.String("array_column"),
			Type: glue.Schema_Array(glue.Schema_INTEGER()),
			Comment: jsii.String("array<integer>"),
		},
		&column{
			Name: jsii.String("map_column"),
			Type: glue.Schema_Map(glue.Schema_STRING(), glue.Schema_TIMESTAMP()),
			Comment: jsii.String("map<string,string>"),
		},
		&column{
			Name: jsii.String("struct_column"),
			Type: glue.Schema_Struct([]*column{
				&column{
					Name: jsii.String("nested_column"),
					Type: glue.Schema_DATE(),
					Comment: jsii.String("nested comment"),
				},
			}),
			Comment: jsii.String("struct<nested_column:date COMMENT 'nested comment'>"),
		},
	},
	// ...
	Database: myDatabase,
	DataFormat: glue.DataFormat_JSON(),
})

Primitives

Numeric

Name	Type	Comments
FLOAT	Constant	A 32-bit single-precision floating point number
INTEGER	Constant	A 32-bit signed value in two's complement format, with a minimum value of -2^31 and a maximum value of 2^31-1
DOUBLE	Constant	A 64-bit double-precision floating point number
BIG_INT	Constant	A 64-bit signed INTEGER in two’s complement format, with a minimum value of -2^63 and a maximum value of 2^63 -1
SMALL_INT	Constant	A 16-bit signed INTEGER in two’s complement format, with a minimum value of -2^15 and a maximum value of 2^15-1
TINY_INT	Constant	A 8-bit signed INTEGER in two’s complement format, with a minimum value of -2^7 and a maximum value of 2^7-1

Date and time

Name	Type	Comments
DATE	Constant	A date in UNIX format, such as YYYY-MM-DD.
TIMESTAMP	Constant	Date and time instant in the UNiX format, such as yyyy-mm-dd hh:mm:ss[.f...]. For example, TIMESTAMP '2008-09-15 03:04:05.324'. This format uses the session time zone.

String

Name	Type	Comments
STRING	Constant	A string literal enclosed in single or double quotes
decimal(precision: number, scale?: number)	Function	`precision` is the total number of digits. `scale` (optional) is the number of digits in fractional part with a default of 0. For example, use these type definitions: decimal(11,5), decimal(15)
char(length: number)	Function	Fixed length character data, with a specified length between 1 and 255, such as char(10)
varchar(length: number)	Function	Variable length character data, with a specified length between 1 and 65535, such as varchar(10)

Miscellaneous

Name	Type	Comments
BOOLEAN	Constant	Values are `true` and `false`
BINARY	Constant	Value is in binary

Complex

Name	Type	Comments
array(itemType: Type)	Function	An array of some other type
map(keyType: Type, valueType: Type)	Function	A map of some primitive key type to any value type
struct(collumns: Column[])	Function	Nested structure containing individually named and typed collumns

Data Quality Ruleset

A DataQualityRuleset specifies a data quality ruleset with DQDL rules applied to a specified AWS Glue table. For example, to create a data quality ruleset for a given table:

glue.NewDataQualityRuleset(this, jsii.String("MyDataQualityRuleset"), &DataQualityRulesetProps{
	ClientToken: jsii.String("client_token"),
	Description: jsii.String("description"),
	RulesetName: jsii.String("ruleset_name"),
	RulesetDqdl: jsii.String("ruleset_dqdl"),
	Tags: map[string]*string{
		"key1": jsii.String("value1"),
		"key2": jsii.String("value2"),
	},
	TargetTable: glue.NewDataQualityTargetTable(jsii.String("database_name"), jsii.String("table_name")),
})

For more information, see AWS Glue Data Quality.

# Packages

jsii

Package jsii contains the functionaility needed for jsii packages to initialize their dependencies and themselves.

# Functions

AssetCode_FromAsset

Job code from a local disk path.

AssetCode_FromBucket

Job code as an S3 object.

ClassificationString_AVRO

No description provided by the author

ClassificationString_CSV

No description provided by the author

ClassificationString_JSON

No description provided by the author

ClassificationString_ORC

No description provided by the author

ClassificationString_PARQUET

No description provided by the author

ClassificationString_XML

No description provided by the author

Code_FromAsset

Job code from a local disk path.

Code_FromBucket

Job code as an S3 object.

Connection_FromConnectionArn

Creates a Connection construct that represents an external connection.

Connection_FromConnectionName

Creates a Connection construct that represents an external connection.

Connection_IsConstruct

Checks if `x` is a construct.

Connection_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

Connection_IsResource

Check whether the given construct is a Resource.

ConnectionType_CUSTOM

No description provided by the author

ConnectionType_JDBC

No description provided by the author

ConnectionType_KAFKA

No description provided by the author

ConnectionType_MARKETPLACE

No description provided by the author

ConnectionType_MONGODB

No description provided by the author

ConnectionType_NETWORK

No description provided by the author

Database_FromDatabaseArn

Experimental.

Database_IsConstruct

Checks if `x` is a construct.

Database_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

Database_IsResource

Check whether the given construct is a Resource.

DataFormat_APACHE_LOGS

No description provided by the author

DataFormat_AVRO

No description provided by the author

DataFormat_CLOUDTRAIL_LOGS

No description provided by the author

DataFormat_CSV

No description provided by the author

DataFormat_JSON

No description provided by the author

DataFormat_LOGSTASH

No description provided by the author

DataFormat_ORC

No description provided by the author

DataFormat_PARQUET

No description provided by the author

DataFormat_TSV

No description provided by the author

DataQualityRuleset_FromRulesetArn

Experimental.

DataQualityRuleset_FromRulesetName

Experimental.

DataQualityRuleset_IsConstruct

Checks if `x` is a construct.

DataQualityRuleset_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

DataQualityRuleset_IsResource

Check whether the given construct is a Resource.

ExternalTable_FromTableArn

Experimental.

ExternalTable_FromTableAttributes

Creates a Table construct that represents an external table.

ExternalTable_IsConstruct

Checks if `x` is a construct.

ExternalTable_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

ExternalTable_IsResource

Check whether the given construct is a Resource.

GlueVersion_Of

Custom Glue version.

GlueVersion_V0_9

No description provided by the author

GlueVersion_V1_0

No description provided by the author

GlueVersion_V2_0

No description provided by the author

GlueVersion_V3_0

No description provided by the author

GlueVersion_V4_0

No description provided by the author

InputFormat_AVRO

No description provided by the author

InputFormat_CLOUDTRAIL

No description provided by the author

InputFormat_ORC

No description provided by the author

InputFormat_PARQUET

No description provided by the author

InputFormat_TEXT

No description provided by the author

Job_FromJobAttributes

Creates a Glue Job.

Job_IsConstruct

Checks if `x` is a construct.

Job_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

Job_IsResource

Check whether the given construct is a Resource.

JobExecutable_Of

Create a custom JobExecutable.

JobExecutable_PythonEtl

Create Python executable props for Apache Spark ETL job.

JobExecutable_PythonRay

Create Python executable props for Ray jobs.

JobExecutable_PythonShell

Create Python executable props for python shell jobs.

JobExecutable_PythonStreaming

Create Python executable props for Apache Spark Streaming job.

JobExecutable_ScalaEtl

Create Scala executable props for Apache Spark ETL job.

JobExecutable_ScalaStreaming

Create Scala executable props for Apache Spark Streaming job.

JobType_ETL

No description provided by the author

JobType_Of

Custom type name.

JobType_PYTHON_SHELL

No description provided by the author

JobType_RAY

No description provided by the author

JobType_STREAMING

No description provided by the author

NewAssetCode

Experimental.

NewAssetCode_Override

Experimental.

NewClassificationString

Experimental.

NewClassificationString_Override

Experimental.

NewCode_Override

Experimental.

NewConnection

Experimental.

NewConnection_Override

Experimental.

NewConnectionType

Experimental.

NewConnectionType_Override

Experimental.

NewDatabase

Experimental.

NewDatabase_Override

Experimental.

NewDataFormat

Experimental.

NewDataFormat_Override

Experimental.

NewDataQualityRuleset

Experimental.

NewDataQualityRuleset_Override

Experimental.

NewDataQualityTargetTable

Experimental.

NewDataQualityTargetTable_Override

Experimental.

NewExternalTable

Experimental.

NewExternalTable_Override

Experimental.

NewInputFormat

Experimental.

NewInputFormat_Override

Experimental.

NewJob

Experimental.

NewJob_Override

Experimental.

NewOutputFormat

Experimental.

NewOutputFormat_Override

Experimental.

Experimental.

Experimental.

Experimental.

Experimental.

Experimental.

Experimental.

NewSecurityConfiguration

Experimental.

NewSecurityConfiguration_Override

Experimental.

NewSerializationLibrary

Experimental.

NewSerializationLibrary_Override

Experimental.

NewStorageParameter

Experimental.

NewStorageParameter_Override

Experimental.

NewTable

Deprecated: Use {@link S3Table } instead.

NewTable_Override

Deprecated: Use {@link S3Table } instead.

NewTableBase_Override

Experimental.

OutputFormat_AVRO

No description provided by the author

OutputFormat_HIVE_IGNORE_KEY_TEXT

No description provided by the author

OutputFormat_ORC

No description provided by the author

OutputFormat_PARQUET

No description provided by the author

Runtime_Of

Custom runtime.

Runtime_RAY_TWO_FOUR

No description provided by the author

S3Code_FromAsset

Job code from a local disk path.

S3Code_FromBucket

Job code as an S3 object.

S3Table_FromTableArn

Experimental.

S3Table_FromTableAttributes

Creates a Table construct that represents an external table.

S3Table_IsConstruct

Checks if `x` is a construct.

S3Table_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

S3Table_IsResource

Check whether the given construct is a Resource.

Schema_Array

Creates an array of some other type.

Schema_BIG_INT

No description provided by the author

Schema_BINARY

No description provided by the author

Schema_BOOLEAN

No description provided by the author

Schema_Char

Fixed length character data, with a specified length between 1 and 255.

Schema_DATE

No description provided by the author

Schema_Decimal

Creates a decimal type.

Schema_DOUBLE

No description provided by the author

Schema_FLOAT

No description provided by the author

Schema_INTEGER

No description provided by the author

Schema_Map

Creates a map of some primitive key type to some value type.

Schema_SMALL_INT

No description provided by the author

Schema_STRING

No description provided by the author

Schema_Struct

Creates a nested structure containing individually named and typed columns.

Schema_TIMESTAMP

No description provided by the author

Schema_TINY_INT

No description provided by the author

Schema_Varchar

Variable length character data, with a specified length between 1 and 65535.

SecurityConfiguration_FromSecurityConfigurationName

Creates a Connection construct that represents an external security configuration.

SecurityConfiguration_IsConstruct

Checks if `x` is a construct.

SecurityConfiguration_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

SecurityConfiguration_IsResource

Check whether the given construct is a Resource.

SerializationLibrary_AVRO

No description provided by the author

SerializationLibrary_CLOUDTRAIL

No description provided by the author

SerializationLibrary_GROK

No description provided by the author

SerializationLibrary_HIVE_JSON

No description provided by the author

SerializationLibrary_LAZY_SIMPLE

No description provided by the author

SerializationLibrary_OPEN_CSV

No description provided by the author

SerializationLibrary_OPENX_JSON

No description provided by the author

SerializationLibrary_ORC

No description provided by the author

SerializationLibrary_PARQUET

No description provided by the author

SerializationLibrary_REGEXP

No description provided by the author

StorageParameter_ColumnCountMismatchHandling

Identifies if the file contains less or more values for a row than the number of columns specified in the external table definition.

StorageParameter_CompressionType

The type of compression used on the table, when the file name does not contain an extension.

StorageParameter_Custom

A custom storage parameter.

StorageParameter_DataCleansingEnabled

Determines whether data handling is on for the table.

StorageParameter_InvalidCharHandling

Specifies the action to perform when query results contain invalid UTF-8 character values.

StorageParameter_NumericOverflowHandling

Specifies the action to perform when ORC data contains an integer (for example, BIGINT or int64) that is larger than the column definition (for example, SMALLINT or int16).

StorageParameter_NumRows

A property that sets the numRows value for the table definition.

StorageParameter_OrcSchemaResolution

A property that sets the column mapping type for tables that use ORC data format.

StorageParameter_ReplacementChar

Specifies the replacement character to use when you set `INVALID_CHAR_HANDLING` to `REPLACE`.

StorageParameter_SerializationNullFormat

A property that sets number of rows to skip at the beginning of each source file.

StorageParameter_SkipHeaderLineCount

The number of rows to skip at the top of a CSV file when the table is being created.

StorageParameter_SurplusBytesHandling

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARBYTE data.

StorageParameter_SurplusCharHandling

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARCHAR, CHAR, or string data.

StorageParameter_WriteKmsKeyId

You can specify an AWS Key Management Service key to enable Server–Side Encryption (SSE) for Amazon S3 objects.

StorageParameter_WriteMaxFileSizeMb

A property that sets the maximum size (in MB) of each file written to Amazon S3 by CREATE EXTERNAL TABLE AS.

StorageParameter_WriteParallel

A property that sets whether CREATE EXTERNAL TABLE AS should write data in parallel.

Table_FromTableArn

Deprecated: Use {@link S3Table } instead.

Table_FromTableAttributes

Creates a Table construct that represents an external table.

Table_IsConstruct

Checks if `x` is a construct.

Table_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

Table_IsResource

Check whether the given construct is a Resource.

TableBase_FromTableArn

Experimental.

TableBase_FromTableAttributes

Creates a Table construct that represents an external table.

TableBase_IsConstruct

Checks if `x` is a construct.

TableBase_IsOwnedResource

Returns true if the construct was created by CDK, and false otherwise.

TableBase_IsResource

Check whether the given construct is a Resource.

WorkerType_G_025X

No description provided by the author

WorkerType_G_1X

No description provided by the author

WorkerType_G_2X

No description provided by the author

WorkerType_G_4X

No description provided by the author

WorkerType_G_8X

No description provided by the author

WorkerType_Of

Custom worker type.

WorkerType_STANDARD

No description provided by the author

WorkerType_Z_2X

No description provided by the author

# Constants

CloudWatchEncryptionMode_KMS

Server-side encryption (SSE) with an AWS KMS key managed by the account owner.

ColumnCountMismatchHandlingAction_DISABLED

Column count mismatch handling is turned off.

ColumnCountMismatchHandlingAction_DROP_ROW

Drop all rows that contain column count mismatch error from the scan.

ColumnCountMismatchHandlingAction_FAIL

Fail the query if the column count mismatch is detected.

ColumnCountMismatchHandlingAction_SET_TO_NULL

Fill missing values with NULL and ignore the additional values in each row.

CompressionType_BZIP2

Burrows-Wheeler compression.

CompressionType_GZIP

Deflate compression.

CompressionType_NONE

No compression.

CompressionType_SNAPPY

Compression algorithm focused on high compression and decompression speeds, rather than the maximum possible compression.

ExecutionClass_FLEX

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

ExecutionClass_STANDARD

The standard execution class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

InvalidCharHandlingAction_DISABLED

Doesn't perform invalid character handling.

InvalidCharHandlingAction_DROP_ROW

Replaces each value in the row with null.

InvalidCharHandlingAction_FAIL

Cancels queries that return data containing invalid UTF-8 values.

InvalidCharHandlingAction_REPLACE

Replaces the invalid character with the replacement character you specify using `REPLACEMENT_CHAR`.

InvalidCharHandlingAction_SET_TO_NULL

Replaces invalid UTF-8 values with null.

JobBookmarksEncryptionMode_CLIENT_SIDE_KMS

Client-side encryption (CSE) with an AWS KMS key managed by the account owner.

JobLanguage_PYTHON

Python.

JobLanguage_SCALA

Scala.

JobState_FAILED

State indicating job run failed.

JobState_RUNNING

State indicating job is running.

JobState_STARTING

State indicating job is starting.

JobState_STOPPED

State indicating job stopped.

JobState_STOPPING

State indicating job is stopping.

JobState_SUCCEEDED

State indicating job run succeeded.

JobState_TIMEOUT

State indicating job run timed out.

MetricType_COUNT

An aggregate number.

MetricType_GAUGE

A value at a point in time.

NumericOverflowHandlingAction_DISABLED

Invalid character handling is turned off.

NumericOverflowHandlingAction_DROP_ROW

Set each value in the row to null.

NumericOverflowHandlingAction_FAIL

Cancel the query when the data includes invalid characters.

NumericOverflowHandlingAction_SET_TO_NULL

Set invalid characters to null.

OrcColumnMappingType_NAME

Map columns by name.

OrcColumnMappingType_POSITION

Map columns by position.

PythonVersion_THREE

Python 3 (the exact version depends on GlueVersion and JobCommand used).

PythonVersion_THREE_NINE

Python 3.9 (the exact version depends on GlueVersion and JobCommand used).

PythonVersion_TWO

Python 2 (the exact version depends on GlueVersion and JobCommand used).

S3EncryptionMode_KMS

Server-side encryption (SSE) with an AWS KMS key managed by the account owner.

S3EncryptionMode_S3_MANAGED

Server side encryption (SSE) with an Amazon S3-managed key.

StorageParameters_COLUMN_COUNT_MISMATCH_HANDLING

Identifies if the file contains less or more values for a row than the number of columns specified in the external table definition.

StorageParameters_COMPRESSION_TYPE

The type of compression used on the table, when the file name does not contain an extension.

StorageParameters_DATA_CLEANSING_ENABLED

Determines whether data handling is on for the table.

StorageParameters_INVALID_CHAR_HANDLING

Specifies the action to perform when query results contain invalid UTF-8 character values.

StorageParameters_NUM_ROWS

A property that sets the numRows value for the table definition.

StorageParameters_NUMERIC_OVERFLOW_HANDLING

Specifies the action to perform when ORC data contains an integer (for example, BIGINT or int64) that is larger than the column definition (for example, SMALLINT or int16).

StorageParameters_ORC_SCHEMA_RESOLUTION

A property that sets the column mapping type for tables that use ORC data format.

StorageParameters_REPLACEMENT_CHAR

Specifies the replacement character to use when you set `INVALID_CHAR_HANDLING` to `REPLACE`.

StorageParameters_SERIALIZATION_NULL_FORMAT

A property that sets number of rows to skip at the beginning of each source file.

StorageParameters_SKIP_HEADER_LINE_COUNT

The number of rows to skip at the top of a CSV file when the table is being created.

StorageParameters_SURPLUS_BYTES_HANDLING

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARBYTE data.

StorageParameters_SURPLUS_CHAR_HANDLING

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARCHAR, CHAR, or string data.

StorageParameters_WRITE_KMS_KEY_ID

You can specify an AWS Key Management Service key to enable Server–Side Encryption (SSE) for Amazon S3 objects.

StorageParameters_WRITE_MAX_FILESIZE_MB

A property that sets the maximum size (in MB) of each file written to Amazon S3 by CREATE EXTERNAL TABLE AS.

StorageParameters_WRITE_PARALLEL

A property that sets whether CREATE EXTERNAL TABLE AS should write data in parallel.

SurplusBytesHandlingAction_DISABLED

Doesn't perform surplus byte handling.

SurplusBytesHandlingAction_DROP_ROW

Drop all rows that contain data exceeding column width.

SurplusBytesHandlingAction_FAIL

Cancels queries that return data exceeding the column width.

SurplusBytesHandlingAction_SET_TO_NULL

Replaces data that exceeds the column width with null.

SurplusBytesHandlingAction_TRUNCATE

Removes the characters that exceed the maximum number of characters defined for the column.

SurplusCharHandlingAction_DISABLED

Doesn't perform surplus character handling.

SurplusCharHandlingAction_DROP_ROW

Replaces each value in the row with null.

SurplusCharHandlingAction_FAIL

Cancels queries that return data exceeding the column width.

SurplusCharHandlingAction_SET_TO_NULL

Replaces data that exceeds the column width with null.

SurplusCharHandlingAction_TRUNCATE

Removes the characters that exceed the maximum number of characters defined for the column.

TableEncryption_CLIENT_SIDE_KMS

Client-side encryption (CSE) with an AWS KMS key managed by the account owner.

TableEncryption_KMS

Server-side encryption (SSE) with an AWS KMS key managed by the account owner.

TableEncryption_KMS_MANAGED

Server-side encryption (SSE) with an AWS KMS key managed by the KMS service.

TableEncryption_S3_MANAGED

Server side encryption (SSE) with an Amazon S3-managed key.

WriteParallel_OFF

Write data serially.

WriteParallel_ON

Write data in parallel.

# Structs

CloudWatchEncryption

CloudWatch Logs encryption configuration.

CodeConfig

Result of binding `Code` into a `Job`.

Column

A column of a table.

ConnectionOptions

Base Connection Options.

ConnectionProps

Construction properties for `Connection`.

ContinuousLoggingProps

Properties for enabling Continuous Logging for Glue Jobs.

DatabaseProps

Example: glue.NewDatabase(this, jsii.String("MyDatabase"), &DatabaseProps{ DatabaseName: jsii.String("my_database"), Description: jsii.String("my_database_description"), }) Experimental.

DataFormatProps

Properties of a DataFormat instance.

DataQualityRulesetProps

Construction properties for `DataQualityRuleset`.

ExternalTableProps

Example: var myConnection connection var myDatabase database glue.NewExternalTable(this, jsii.String("MyTable"), &ExternalTableProps{ Connection: myConnection, ExternalDataLocation: jsii.String("default_db_public_example"), // A table in Redshift // ..

JobAttributes

Attributes for importing `Job`.

JobBookmarksEncryption

Job bookmarks encryption configuration.

JobExecutableConfig

Result of binding a `JobExecutable` into a `Job`.

JobProps

Construction properties for `Job`.

PartitionIndex

Properties of a Partition Index.

PythonRayExecutableProps

Props for creating a Python Ray job executable.

PythonShellExecutableProps

Props for creating a Python shell job executable.

PythonSparkJobExecutableProps

Props for creating a Python Spark (ETL or Streaming) job executable.

S3Encryption

S3 encryption configuration.

S3TableProps

Example: var myDatabase database glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{ Database: myDatabase, Columns: []column{ &column{ Name: jsii.String("col1"), Type: glue.Schema_STRING(), }, }, PartitionKeys: []*column{ &column{ Name: jsii.String("year"), Type: glue.Schema_SMALL_INT(), }, &column{ Name: jsii.String("month"), Type: glue.Schema_SMALL_INT(), }, }, DataFormat: glue.DataFormat_JSON(), EnablePartitionFiltering: jsii.Boolean(true), }) Experimental.

ScalaJobExecutableProps

Props for creating a Scala Spark (ETL or Streaming) job executable.

SecurityConfigurationProps

Constructions properties of `SecurityConfiguration`.

SparkUILoggingLocation

The Spark UI logging location.

SparkUIProps

Properties for enabling Spark UI monitoring feature for Spark-based Glue jobs.

TableAttributes

Example: // The code below shows an example of how to instantiate this type.

TableBaseProps

Example: // The code below shows an example of how to instantiate this type.

TableProps

Example: // The code below shows an example of how to instantiate this type.

Type

Represents a type of a column in a table schema.

# Interfaces

AssetCode

Job Code from a local file.

ClassificationString

Classification string given to tables with this data format.

Code

Represents a Glue Job's Code assets (an asset can be a scripts, a jar, a python file or any other file).

Connection

An AWS Glue connection to a data source.

ConnectionType

The type of the glue connection.

Database

A Glue database.

DataFormat

Defines the input/output formats and ser/de for a single DataFormat.

DataQualityRuleset

A Glue Data Quality ruleset.

DataQualityTargetTable

Properties of a DataQualityTargetTable.

ExternalTable

A Glue table that targets an external data location (e.g.

GlueVersion

AWS Glue version determines the versions of Apache Spark and Python that are available to the job.

IConnection

Interface representing a created or an imported `Connection`.

IDatabase

Experimental.

IDataQualityRuleset

Experimental.

IJob

Interface representing a created or an imported `Job`.

InputFormat

Absolute class name of the Hadoop `InputFormat` to use when reading table files.

ISecurityConfiguration

Interface representing a created or an imported `SecurityConfiguration`.

ITable

Experimental.

Job

A Glue Job.

JobExecutable

The executable properties related to the Glue job's GlueVersion, JobType and code.

JobType

The job type.

OutputFormat

Absolute class name of the Hadoop `OutputFormat` to use when writing table files.

Runtime

AWS Glue runtime determines the runtime engine of the job.

S3Code

Glue job Code from an S3 bucket.

S3Table

A Glue table that targets a S3 dataset.

Schema

Example: var myDatabase database glue.NewS3Table(this, jsii.String("MyTable"), &S3TableProps{ Database: myDatabase, Columns: []column{ &column{ Name: jsii.String("col1"), Type: glue.Schema_STRING(), }, }, PartitionKeys: []*column{ &column{ Name: jsii.String("year"), Type: glue.Schema_SMALL_INT(), }, &column{ Name: jsii.String("month"), Type: glue.Schema_SMALL_INT(), }, }, DataFormat: glue.DataFormat_JSON(), }) See: https://docs.aws.amazon.com/athena/latest/ug/data-types.html Experimental.

SecurityConfiguration

A security configuration is a set of security properties that can be used by AWS Glue to encrypt data at rest.

SerializationLibrary

Serialization library to use when serializing/deserializing (SerDe) table records.

A storage parameter.

A Glue table.

A Glue table.

The type of predefined worker that is allocated when a job runs.

# Type aliases

CloudWatchEncryptionMode

Encryption mode for CloudWatch Logs.

ColumnCountMismatchHandlingAction

Identifies if the file contains less or more values for a row than the number of columns specified in the external table definition.

CompressionType

The compression type.

ExecutionClass

The ExecutionClass whether the job is run with a standard or flexible execution class.

InvalidCharHandlingAction

Specifies the action to perform when query results contain invalid UTF-8 character values.

JobBookmarksEncryptionMode

Encryption mode for Job Bookmarks.

JobLanguage

Runtime language of the Glue job.

JobState

Job states emitted by Glue to CloudWatch Events.

MetricType

The Glue CloudWatch metric type.

NumericOverflowHandlingAction

Specifies the action to perform when ORC data contains an integer (for example, BIGINT or int64) that is larger than the column definition (for example, SMALLINT or int16).

OrcColumnMappingType

Specifies how to map columns when the table uses ORC data format.

PythonVersion

Python version.

S3EncryptionMode

Encryption mode for S3.

StorageParameters

The storage parameter keys that are currently known, this list is not exhaustive and other keys may be used.

SurplusBytesHandlingAction

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARBYTE data.

SurplusCharHandlingAction

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARCHAR, CHAR, or string data.

TableEncryption

Encryption options for a Table.

WriteParallel

Specifies how to handle data being loaded that exceeds the length of the data type defined for columns containing VARCHAR, CHAR, or string data.