Categorygithub.com/aws/aws-cdk-go/awscdkawsgluealpha/v2
modulepackage
2.0.0-rc.24
Repository: https://github.com/aws/aws-cdk-go.git
Documentation: pkg.go.dev

# README

AWS Glue Construct Library


cfn-resources: Stable

All classes with the Cfn prefix in this module (CFN Resources) are always stable and safe to use.

cdk-constructs: Experimental

The APIs of higher level constructs in this module are experimental and under active development. They are subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model and breaking changes will be announced in the release notes. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


This module is part of the AWS Cloud Development Kit project.

Job

A Job encapsulates a script that connects to data sources, processes them, and then writes output to a data target.

There are 3 types of jobs supported by AWS Glue: Spark ETL, Spark Streaming, and Python Shell jobs.

The glue.JobExecutable allows you to specify the type of job, the language to use and the code assets required by the job.

glue.Code allows you to refer to the different code assets required by the job, either from an existing S3 location or from a local file path.

Spark Jobs

These jobs run in an Apache Spark environment managed by AWS Glue.

ETL Jobs

An ETL job processes data in batches using Apache Spark.

new glue.Job(stack, 'ScalaSparkEtlJob', {
  executable: glue.JobExecutable.scalaEtl({
    glueVersion: glue.GlueVersion.V2_0,
    script: glue.Code.fromBucket(bucket, 'src/com/example/HelloWorld.scala'),
    className: 'com.example.HelloWorld',
    extraJars: [glue.Code.fromBucket(bucket, 'jars/HelloWorld.jar')],
  }),
  description: 'an example Scala ETL job',
});

Streaming Jobs

A Streaming job is similar to an ETL job, except that it performs ETL on data streams. It uses the Apache Spark Structured Streaming framework. Some Spark job features are not available to streaming ETL jobs.

new glue.Job(stack, 'PythonSparkStreamingJob', {
  executable: glue.JobExecutable.pythonStreaming({
    glueVersion: glue.GlueVersion.V2_0,
    pythonVersion: glue.PythonVersion.THREE,
    script: glue.Code.fromAsset(path.join(__dirname, 'job-script/hello_world.py')),
  }),
  description: 'an example Python Streaming job',
});

Python Shell Jobs

A Python shell job runs Python scripts as a shell and supports a Python version that depends on the AWS Glue version you are using. This can be used to schedule and run tasks that don't require an Apache Spark environment.

new glue.Job(stack, 'PythonShellJob', {
  executable: glue.JobExecutable.pythonShell({
    glueVersion: glue.GlueVersion.V1_0,
    pythonVersion: PythonVersion.THREE,
    script: glue.Code.fromBucket(bucket, 'script.py'),
  }),
  description: 'an example Python Shell job',
});

See documentation for more information on adding jobs in Glue.

Connection

A Connection allows Glue jobs, crawlers and development endpoints to access certain types of data stores. For example, to create a network connection to connect to a data source within a VPC:

new glue.Connection(stack, 'MyConnection', {
  connectionType: glue.ConnectionTypes.NETWORK,
  // The security groups granting AWS Glue inbound access to the data source within the VPC
  securityGroups: [securityGroup],
  // The VPC subnet which contains the data source
  subnet,
});

If you need to use a connection type that doesn't exist as a static member on ConnectionType, you can instantiate a ConnectionType object, e.g: new glue.ConnectionType('NEW_TYPE').

See Adding a Connection to Your Data Store and Connection Structure documentation for more information on the supported data stores and their configurations.

SecurityConfiguration

A SecurityConfiguration is a set of security properties that can be used by AWS Glue to encrypt data at rest.

new glue.SecurityConfiguration(stack, 'MySecurityConfiguration', {
  securityConfigurationName: 'name',
  cloudWatchEncryption: {
    mode: glue.CloudWatchEncryptionMode.KMS,
  },
  jobBookmarksEncryption: {
    mode: glue.JobBookmarksEncryptionMode.CLIENT_SIDE_KMS,
  },
  s3Encryption: {
    mode: glue.S3EncryptionMode.KMS,
  },
});

By default, a shared KMS key is created for use with the encryption configurations that require one. You can also supply your own key for each encryption config, for example, for CloudWatch encryption:

new glue.SecurityConfiguration(stack, 'MySecurityConfiguration', {
  securityConfigurationName: 'name',
  cloudWatchEncryption: {
    mode: glue.CloudWatchEncryptionMode.KMS,
    kmsKey: key,
  },
});

See documentation for more info for Glue encrypting data written by Crawlers, Jobs, and Development Endpoints.

Database

A Database is a logical grouping of Tables in the Glue Catalog.

new glue.Database(stack, 'MyDatabase', {
  databaseName: 'my_database'
});

Table

A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.):

new glue.Table(stack, 'MyTable', {
  database: myDatabase,
  tableName: 'my_table',
  columns: [{
    name: 'col1',
    type: glue.Schema.STRING,
  }, {
    name: 'col2',
    type: glue.Schema.array(Schema.STRING),
    comment: 'col2 is an array of strings' // comment is optional
  }],
  dataFormat: glue.DataFormat.JSON
});

By default, a S3 bucket will be created to store the table's data but you can manually pass the bucket and s3Prefix:

new glue.Table(stack, 'MyTable', {
  bucket: myBucket,
  s3Prefix: 'my-table/'
  ...
});

By default, an S3 bucket will be created to store the table's data and stored in the bucket root. You can also manually pass the bucket and s3Prefix:

Partitions

To improve query performance, a table can specify partitionKeys on which data is stored and queried separately. For example, you might partition a table by year and month to optimize queries based on a time window:

new glue.Table(stack, 'MyTable', {
  database: myDatabase,
  tableName: 'my_table',
  columns: [{
    name: 'col1',
    type: glue.Schema.STRING
  }],
  partitionKeys: [{
    name: 'year',
    type: glue.Schema.SMALL_INT
  }, {
    name: 'month',
    type: glue.Schema.SMALL_INT
  }],
  dataFormat: glue.DataFormat.JSON
});

Encryption

You can enable encryption on a Table's data:

  • Unencrypted - files are not encrypted. The default encryption setting.
  • S3Managed - Server side encryption (SSE-S3) with an Amazon S3-managed key.
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.S3_MANAGED
  ...
});
  • Kms - Server-side encryption (SSE-KMS) with an AWS KMS Key managed by the account owner.
// KMS key is created automatically
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS
  ...
});

// with an explicit KMS key
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS,
  encryptionKey: new kms.Key(stack, 'MyKey')
  ...
});
  • KmsManaged - Server-side encryption (SSE-KMS), like Kms, except with an AWS KMS Key managed by the AWS Key Management Service.
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS_MANAGED
  ...
});
  • ClientSideKms - Client-side encryption (CSE-KMS) with an AWS KMS Key managed by the account owner.
// KMS key is created automatically
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.CLIENT_SIDE_KMS
  ...
});

// with an explicit KMS key
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.CLIENT_SIDE_KMS,
  encryptionKey: new kms.Key(stack, 'MyKey')
  ...
});

Note: you cannot provide a Bucket when creating the Table if you wish to use server-side encryption (KMS, KMS_MANAGED or S3_MANAGED).

Types

A table's schema is a collection of columns, each of which have a name and a type. Types are recursive structures, consisting of primitive and complex types:

new glue.Table(stack, 'MyTable', {
  columns: [{
    name: 'primitive_column',
    type: glue.Schema.STRING
  }, {
    name: 'array_column',
    type: glue.Schema.array(glue.Schema.INTEGER),
    comment: 'array<integer>'
  }, {
    name: 'map_column',
    type: glue.Schema.map(
      glue.Schema.STRING,
      glue.Schema.TIMESTAMP),
    comment: 'map<string,string>'
  }, {
    name: 'struct_column',
    type: glue.Schema.struct([{
      name: 'nested_column',
      type: glue.Schema.DATE,
      comment: 'nested comment'
    }]),
    comment: "struct<nested_column:date COMMENT 'nested comment'>"
  }],
  ...

Primitives

Numeric

NameTypeComments
FLOATConstantA 32-bit single-precision floating point number
INTEGERConstantA 32-bit signed value in two's complement format, with a minimum value of -2^31 and a maximum value of 2^31-1
DOUBLEConstantA 64-bit double-precision floating point number
BIG_INTConstantA 64-bit signed INTEGER in two’s complement format, with a minimum value of -2^63 and a maximum value of 2^63 -1
SMALL_INTConstantA 16-bit signed INTEGER in two’s complement format, with a minimum value of -2^15 and a maximum value of 2^15-1
TINY_INTConstantA 8-bit signed INTEGER in two’s complement format, with a minimum value of -2^7 and a maximum value of 2^7-1

Date and time

NameTypeComments
DATEConstantA date in UNIX format, such as YYYY-MM-DD.
TIMESTAMPConstantDate and time instant in the UNiX format, such as yyyy-mm-dd hh:mm:ss[.f...]. For example, TIMESTAMP '2008-09-15 03:04:05.324'. This format uses the session time zone.

String

NameTypeComments
STRINGConstantA string literal enclosed in single or double quotes
decimal(precision: number, scale?: number)Functionprecision is the total number of digits. scale (optional) is the number of digits in fractional part with a default of 0. For example, use these type definitions: decimal(11,5), decimal(15)
char(length: number)FunctionFixed length character data, with a specified length between 1 and 255, such as char(10)
varchar(length: number)FunctionVariable length character data, with a specified length between 1 and 65535, such as varchar(10)

Miscellaneous

NameTypeComments
BOOLEANConstantValues are true and false
BINARYConstantValue is in binary

Complex

NameTypeComments
array(itemType: Type)FunctionAn array of some other type
map(keyType: Type, valueType: Type)FunctionA map of some primitive key type to any value type
struct(collumns: Column[])FunctionNested structure containing individually named and typed collumns

# Packages

Package jsii contains the functionaility needed for jsii packages to initialize their dependencies and themselves.

# Functions

Job code from a local disk path.
Job code as an S3 object.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Job code from a local disk path.
Job code as an S3 object.
Creates a Connection construct that represents an external connection.
Creates a Connection construct that represents an external connection.
Checks if `x` is a construct.
Check whether the given construct is a Resource.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Experimental.
Checks if `x` is a construct.
Check whether the given construct is a Resource.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Custom Glue version.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Creates a Glue Job.
Checks if `x` is a construct.
Check whether the given construct is a Resource.
Create a custom JobExecutable.
Create Python executable props for Apache Spark ETL job.
Create Python executable props for python shell jobs.
Create Python executable props for Apache Spark Streaming job.
Create Scala executable props for Apache Spark ETL job.
Create Scala executable props for Apache Spark Streaming job.
No description provided by the author
Custom type name.
No description provided by the author
No description provided by the author
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
Experimental.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Job code from a local disk path.
Job code as an S3 object.
Creates an array of some other type.
No description provided by the author
No description provided by the author
No description provided by the author
Fixed length character data, with a specified length between 1 and 255.
No description provided by the author
Creates a decimal type.
No description provided by the author
No description provided by the author
No description provided by the author
Creates a map of some primitive key type to some value type.
No description provided by the author
No description provided by the author
Creates a nested structure containing individually named and typed columns.
No description provided by the author
No description provided by the author
Variable length character data, with a specified length between 1 and 65535.
Creates a Connection construct that represents an external security configuration.
Checks if `x` is a construct.
Check whether the given construct is a Resource.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Experimental.
Creates a Table construct that represents an external table.
Checks if `x` is a construct.
Check whether the given construct is a Resource.
No description provided by the author
No description provided by the author
Custom worker type.
No description provided by the author

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

CloudWatch Logs encryption configuration.
Result of binding `Code` into a `Job`.
A column of a table.
Base Connection Options.
Construction properties for {@link Connection}.
Properties for enabling Continuous Logging for Glue Jobs.
Experimental.
Properties of a DataFormat instance.
Attributes for importing {@link Job}.
Job bookmarks encryption configuration.
Result of binding a `JobExecutable` into a `Job`.
Construction properties for {@link Job}.
Props for creating a Python shell job executable.
Props for creating a Python Spark (ETL or Streaming) job executable.
S3 encryption configuration.
Props for creating a Scala Spark (ETL or Streaming) job executable.
Constructions properties of {@link SecurityConfiguration}.
The Spark UI logging location.
Properties for enabling Spark UI monitoring feature for Spark-based Glue jobs.
Experimental.
Experimental.
Represents a type of a column in a table schema.

# Interfaces

Job Code from a local file.
Classification string given to tables with this data format.
Represents a Glue Job's Code assets (an asset can be a scripts, a jar, a python file or any other file).
An AWS Glue connection to a data source.
The type of the glue connection.
A Glue database.
Defines the input/output formats and ser/de for a single DataFormat.
AWS Glue version determines the versions of Apache Spark and Python that are available to the job.
Interface representing a created or an imported {@link Connection}.
Experimental.
Interface representing a created or an imported {@link Job}.
Absolute class name of the Hadoop `InputFormat` to use when reading table files.
Interface representing a created or an imported {@link SecurityConfiguration}.
Experimental.
A Glue Job.
The executable properties related to the Glue job's GlueVersion, JobType and code.
The job type.
Absolute class name of the Hadoop `OutputFormat` to use when writing table files.
Glue job Code from an S3 bucket.
See: https://docs.aws.amazon.com/athena/latest/ug/data-types.html Experimental.
A security configuration is a set of security properties that can be used by AWS Glue to encrypt data at rest.
Serialization library to use when serializing/deserializing (SerDe) table records.
A Glue table.
The type of predefined worker that is allocated when a job runs.

# Type aliases

Encryption mode for CloudWatch Logs.
Encryption mode for Job Bookmarks.
Runtime language of the Glue job.
Job states emitted by Glue to CloudWatch Events.
The Glue CloudWatch metric type.
Python version.
Encryption mode for S3.
Encryption options for a Table.