Categorygithub.com/aws/aws-cdk-go/awscdkawsgluealpha/v2

modulepackage

2.0.0-rc.24

Repository: https://github.com/aws/aws-cdk-go.git

Documentation: pkg.go.dev

# README

AWS Glue Construct Library

All classes with the Cfn prefix in this module (CFN Resources) are always stable and safe to use.

The APIs of higher level constructs in this module are experimental and under active development. They are subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model and breaking changes will be announced in the release notes. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.

This module is part of the AWS Cloud Development Kit project.

Job

A Job encapsulates a script that connects to data sources, processes them, and then writes output to a data target.

There are 3 types of jobs supported by AWS Glue: Spark ETL, Spark Streaming, and Python Shell jobs.

The glue.JobExecutable allows you to specify the type of job, the language to use and the code assets required by the job.

glue.Code allows you to refer to the different code assets required by the job, either from an existing S3 location or from a local file path.

Spark Jobs

These jobs run in an Apache Spark environment managed by AWS Glue.

ETL Jobs

An ETL job processes data in batches using Apache Spark.

new glue.Job(stack, 'ScalaSparkEtlJob', {
  executable: glue.JobExecutable.scalaEtl({
    glueVersion: glue.GlueVersion.V2_0,
    script: glue.Code.fromBucket(bucket, 'src/com/example/HelloWorld.scala'),
    className: 'com.example.HelloWorld',
    extraJars: [glue.Code.fromBucket(bucket, 'jars/HelloWorld.jar')],
  }),
  description: 'an example Scala ETL job',
});

Streaming Jobs

A Streaming job is similar to an ETL job, except that it performs ETL on data streams. It uses the Apache Spark Structured Streaming framework. Some Spark job features are not available to streaming ETL jobs.

new glue.Job(stack, 'PythonSparkStreamingJob', {
  executable: glue.JobExecutable.pythonStreaming({
    glueVersion: glue.GlueVersion.V2_0,
    pythonVersion: glue.PythonVersion.THREE,
    script: glue.Code.fromAsset(path.join(__dirname, 'job-script/hello_world.py')),
  }),
  description: 'an example Python Streaming job',
});

Python Shell Jobs

A Python shell job runs Python scripts as a shell and supports a Python version that depends on the AWS Glue version you are using. This can be used to schedule and run tasks that don't require an Apache Spark environment.

new glue.Job(stack, 'PythonShellJob', {
  executable: glue.JobExecutable.pythonShell({
    glueVersion: glue.GlueVersion.V1_0,
    pythonVersion: PythonVersion.THREE,
    script: glue.Code.fromBucket(bucket, 'script.py'),
  }),
  description: 'an example Python Shell job',
});

See documentation for more information on adding jobs in Glue.

Connection

A Connection allows Glue jobs, crawlers and development endpoints to access certain types of data stores. For example, to create a network connection to connect to a data source within a VPC:

new glue.Connection(stack, 'MyConnection', {
  connectionType: glue.ConnectionTypes.NETWORK,
  // The security groups granting AWS Glue inbound access to the data source within the VPC
  securityGroups: [securityGroup],
  // The VPC subnet which contains the data source
  subnet,
});

If you need to use a connection type that doesn't exist as a static member on ConnectionType, you can instantiate a ConnectionType object, e.g: new glue.ConnectionType('NEW_TYPE').

See Adding a Connection to Your Data Store and Connection Structure documentation for more information on the supported data stores and their configurations.

SecurityConfiguration

A SecurityConfiguration is a set of security properties that can be used by AWS Glue to encrypt data at rest.

new glue.SecurityConfiguration(stack, 'MySecurityConfiguration', {
  securityConfigurationName: 'name',
  cloudWatchEncryption: {
    mode: glue.CloudWatchEncryptionMode.KMS,
  },
  jobBookmarksEncryption: {
    mode: glue.JobBookmarksEncryptionMode.CLIENT_SIDE_KMS,
  },
  s3Encryption: {
    mode: glue.S3EncryptionMode.KMS,
  },
});

By default, a shared KMS key is created for use with the encryption configurations that require one. You can also supply your own key for each encryption config, for example, for CloudWatch encryption:

new glue.SecurityConfiguration(stack, 'MySecurityConfiguration', {
  securityConfigurationName: 'name',
  cloudWatchEncryption: {
    mode: glue.CloudWatchEncryptionMode.KMS,
    kmsKey: key,
  },
});

See documentation for more info for Glue encrypting data written by Crawlers, Jobs, and Development Endpoints.

Database

A Database is a logical grouping of Tables in the Glue Catalog.

new glue.Database(stack, 'MyDatabase', {
  databaseName: 'my_database'
});

Table

A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.):

new glue.Table(stack, 'MyTable', {
  database: myDatabase,
  tableName: 'my_table',
  columns: [{
    name: 'col1',
    type: glue.Schema.STRING,
  }, {
    name: 'col2',
    type: glue.Schema.array(Schema.STRING),
    comment: 'col2 is an array of strings' // comment is optional
  }],
  dataFormat: glue.DataFormat.JSON
});

By default, a S3 bucket will be created to store the table's data but you can manually pass the bucket and s3Prefix:

new glue.Table(stack, 'MyTable', {
  bucket: myBucket,
  s3Prefix: 'my-table/'
  ...
});

By default, an S3 bucket will be created to store the table's data and stored in the bucket root. You can also manually pass the bucket and s3Prefix:

Partitions

To improve query performance, a table can specify partitionKeys on which data is stored and queried separately. For example, you might partition a table by year and month to optimize queries based on a time window:

new glue.Table(stack, 'MyTable', {
  database: myDatabase,
  tableName: 'my_table',
  columns: [{
    name: 'col1',
    type: glue.Schema.STRING
  }],
  partitionKeys: [{
    name: 'year',
    type: glue.Schema.SMALL_INT
  }, {
    name: 'month',
    type: glue.Schema.SMALL_INT
  }],
  dataFormat: glue.DataFormat.JSON
});

Encryption

You can enable encryption on a Table's data:

Unencrypted - files are not encrypted. The default encryption setting.
S3Managed - Server side encryption (SSE-S3) with an Amazon S3-managed key.

new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.S3_MANAGED
  ...
});

Kms - Server-side encryption (SSE-KMS) with an AWS KMS Key managed by the account owner.

// KMS key is created automatically
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS
  ...
});

// with an explicit KMS key
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS,
  encryptionKey: new kms.Key(stack, 'MyKey')
  ...
});

KmsManaged - Server-side encryption (SSE-KMS), like Kms, except with an AWS KMS Key managed by the AWS Key Management Service.

new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.KMS_MANAGED
  ...
});

ClientSideKms - Client-side encryption (CSE-KMS) with an AWS KMS Key managed by the account owner.

// KMS key is created automatically
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.CLIENT_SIDE_KMS
  ...
});

// with an explicit KMS key
new glue.Table(stack, 'MyTable', {
  encryption: glue.TableEncryption.CLIENT_SIDE_KMS,
  encryptionKey: new kms.Key(stack, 'MyKey')
  ...
});

Note: you cannot provide a Bucket when creating the Table if you wish to use server-side encryption (KMS, KMS_MANAGED or S3_MANAGED).

Types

A table's schema is a collection of columns, each of which have a name and a type. Types are recursive structures, consisting of primitive and complex types:

new glue.Table(stack, 'MyTable', {
  columns: [{
    name: 'primitive_column',
    type: glue.Schema.STRING
  }, {
    name: 'array_column',
    type: glue.Schema.array(glue.Schema.INTEGER),
    comment: 'array<integer>'
  }, {
    name: 'map_column',
    type: glue.Schema.map(
      glue.Schema.STRING,
      glue.Schema.TIMESTAMP),
    comment: 'map<string,string>'
  }, {
    name: 'struct_column',
    type: glue.Schema.struct([{
      name: 'nested_column',
      type: glue.Schema.DATE,
      comment: 'nested comment'
    }]),
    comment: "struct<nested_column:date COMMENT 'nested comment'>"
  }],
  ...

Primitives

Numeric

Name	Type	Comments
FLOAT	Constant	A 32-bit single-precision floating point number
INTEGER	Constant	A 32-bit signed value in two's complement format, with a minimum value of -2^31 and a maximum value of 2^31-1
DOUBLE	Constant	A 64-bit double-precision floating point number
BIG_INT	Constant	A 64-bit signed INTEGER in two’s complement format, with a minimum value of -2^63 and a maximum value of 2^63 -1
SMALL_INT	Constant	A 16-bit signed INTEGER in two’s complement format, with a minimum value of -2^15 and a maximum value of 2^15-1
TINY_INT	Constant	A 8-bit signed INTEGER in two’s complement format, with a minimum value of -2^7 and a maximum value of 2^7-1

Date and time

Name	Type	Comments
DATE	Constant	A date in UNIX format, such as YYYY-MM-DD.
TIMESTAMP	Constant	Date and time instant in the UNiX format, such as yyyy-mm-dd hh:mm:ss[.f...]. For example, TIMESTAMP '2008-09-15 03:04:05.324'. This format uses the session time zone.

String

Name	Type	Comments
STRING	Constant	A string literal enclosed in single or double quotes
decimal(precision: number, scale?: number)	Function	`precision` is the total number of digits. `scale` (optional) is the number of digits in fractional part with a default of 0. For example, use these type definitions: decimal(11,5), decimal(15)
char(length: number)	Function	Fixed length character data, with a specified length between 1 and 255, such as char(10)
varchar(length: number)	Function	Variable length character data, with a specified length between 1 and 65535, such as varchar(10)

Miscellaneous

Name	Type	Comments
BOOLEAN	Constant	Values are `true` and `false`
BINARY	Constant	Value is in binary

Complex

Name	Type	Comments
array(itemType: Type)	Function	An array of some other type
map(keyType: Type, valueType: Type)	Function	A map of some primitive key type to any value type
struct(collumns: Column[])	Function	Nested structure containing individually named and typed collumns

# Packages

jsii

Package jsii contains the functionaility needed for jsii packages to initialize their dependencies and themselves.

# Functions

AssetCode_FromAsset

Job code from a local disk path.

AssetCode_FromBucket

Job code as an S3 object.

ClassificationString_AVRO

No description provided by the author

ClassificationString_CSV

No description provided by the author

ClassificationString_JSON

No description provided by the author

ClassificationString_ORC

No description provided by the author

ClassificationString_PARQUET

No description provided by the author

ClassificationString_XML

No description provided by the author

Code_FromAsset

Job code from a local disk path.

Code_FromBucket

Job code as an S3 object.

Connection_FromConnectionArn

Creates a Connection construct that represents an external connection.

Connection_FromConnectionName

Creates a Connection construct that represents an external connection.

Connection_IsConstruct

Checks if `x` is a construct.

Connection_IsResource

Check whether the given construct is a Resource.

ConnectionType_JDBC

No description provided by the author

ConnectionType_KAFKA

No description provided by the author

ConnectionType_MONGODB

No description provided by the author

ConnectionType_NETWORK

No description provided by the author

Database_FromDatabaseArn

Experimental.

Database_IsConstruct

Checks if `x` is a construct.

Database_IsResource

Check whether the given construct is a Resource.

DataFormat_APACHE_LOGS

No description provided by the author

DataFormat_AVRO

No description provided by the author

DataFormat_CLOUDTRAIL_LOGS

No description provided by the author

DataFormat_CSV

No description provided by the author

DataFormat_JSON

No description provided by the author

DataFormat_LOGSTASH

No description provided by the author

DataFormat_ORC

No description provided by the author

DataFormat_PARQUET

No description provided by the author

DataFormat_TSV

No description provided by the author

GlueVersion_Of

Custom Glue version.

GlueVersion_V0_9

No description provided by the author

GlueVersion_V1_0

No description provided by the author

GlueVersion_V2_0

No description provided by the author

GlueVersion_V3_0

No description provided by the author

InputFormat_AVRO

No description provided by the author

InputFormat_CLOUDTRAIL

No description provided by the author

InputFormat_ORC

No description provided by the author

InputFormat_PARQUET

No description provided by the author

InputFormat_TEXT

No description provided by the author

Job_FromJobAttributes

Creates a Glue Job.

Job_IsConstruct

Checks if `x` is a construct.

Job_IsResource

Check whether the given construct is a Resource.

JobExecutable_Of

Create a custom JobExecutable.

JobExecutable_PythonEtl

Create Python executable props for Apache Spark ETL job.

JobExecutable_PythonShell

Create Python executable props for python shell jobs.

JobExecutable_PythonStreaming

Create Python executable props for Apache Spark Streaming job.

JobExecutable_ScalaEtl

Create Scala executable props for Apache Spark ETL job.

JobExecutable_ScalaStreaming

Create Scala executable props for Apache Spark Streaming job.

JobType_ETL

No description provided by the author

JobType_Of

Custom type name.

JobType_PYTHON_SHELL

No description provided by the author

JobType_STREAMING

No description provided by the author

NewAssetCode

Experimental.

NewAssetCode_Override

Experimental.

NewClassificationString

Experimental.

NewClassificationString_Override

Experimental.

NewCode_Override

Experimental.

NewConnection

Experimental.

NewConnection_Override

Experimental.

NewConnectionType

Experimental.

NewConnectionType_Override

Experimental.

NewDatabase

Experimental.

NewDatabase_Override

Experimental.

NewDataFormat

Experimental.

NewDataFormat_Override

Experimental.

NewInputFormat

Experimental.

NewInputFormat_Override

Experimental.

NewJob

Experimental.

NewJob_Override

Experimental.

NewOutputFormat

Experimental.

NewOutputFormat_Override

Experimental.

Experimental.

Experimental.

Experimental.

Experimental.

NewSecurityConfiguration

Experimental.

NewSecurityConfiguration_Override

Experimental.

NewSerializationLibrary

Experimental.

NewSerializationLibrary_Override

Experimental.

NewTable

Experimental.

NewTable_Override

Experimental.

OutputFormat_AVRO

No description provided by the author

OutputFormat_HIVE_IGNORE_KEY_TEXT

No description provided by the author

OutputFormat_ORC

No description provided by the author

OutputFormat_PARQUET

No description provided by the author

S3Code_FromAsset

Job code from a local disk path.

S3Code_FromBucket

Job code as an S3 object.

Schema_Array

Creates an array of some other type.

Schema_BIG_INT

No description provided by the author

Schema_BINARY

No description provided by the author

Schema_BOOLEAN

No description provided by the author

Schema_Char

Fixed length character data, with a specified length between 1 and 255.

Schema_DATE

No description provided by the author

Schema_Decimal

Creates a decimal type.

Schema_DOUBLE

No description provided by the author

Schema_FLOAT

No description provided by the author

Schema_INTEGER

No description provided by the author

Schema_Map

Creates a map of some primitive key type to some value type.

Schema_SMALL_INT

No description provided by the author

Schema_STRING

No description provided by the author

Schema_Struct

Creates a nested structure containing individually named and typed columns.

Schema_TIMESTAMP

No description provided by the author

Schema_TINY_INT

No description provided by the author

Schema_Varchar

Variable length character data, with a specified length between 1 and 65535.

SecurityConfiguration_FromSecurityConfigurationName

Creates a Connection construct that represents an external security configuration.

SecurityConfiguration_IsConstruct

Checks if `x` is a construct.

SecurityConfiguration_IsResource

Check whether the given construct is a Resource.

SerializationLibrary_AVRO

No description provided by the author

SerializationLibrary_CLOUDTRAIL

No description provided by the author

SerializationLibrary_GROK

No description provided by the author

SerializationLibrary_HIVE_JSON

No description provided by the author

SerializationLibrary_LAZY_SIMPLE

No description provided by the author

SerializationLibrary_OPEN_CSV

No description provided by the author

SerializationLibrary_OPENX_JSON

No description provided by the author

SerializationLibrary_ORC

No description provided by the author

SerializationLibrary_PARQUET

No description provided by the author

SerializationLibrary_REGEXP

No description provided by the author

Table_FromTableArn

Experimental.

Table_FromTableAttributes

Creates a Table construct that represents an external table.

Table_IsConstruct

Checks if `x` is a construct.

Table_IsResource

Check whether the given construct is a Resource.

WorkerType_G_1X

No description provided by the author

WorkerType_G_2X

No description provided by the author

WorkerType_Of

Custom worker type.

WorkerType_STANDARD

No description provided by the author

# Constants

CloudWatchEncryptionMode_KMS

No description provided by the author

JobBookmarksEncryptionMode_CLIENT_SIDE_KMS

No description provided by the author

JobLanguage_PYTHON

No description provided by the author

JobLanguage_SCALA

No description provided by the author

JobState_FAILED

No description provided by the author

JobState_RUNNING

No description provided by the author

JobState_STARTING

No description provided by the author

JobState_STOPPED

No description provided by the author

JobState_STOPPING

No description provided by the author

JobState_SUCCEEDED

No description provided by the author

JobState_TIMEOUT

No description provided by the author

MetricType_COUNT

No description provided by the author

MetricType_GAUGE

No description provided by the author

PythonVersion_THREE

No description provided by the author

PythonVersion_TWO

No description provided by the author

S3EncryptionMode_KMS

No description provided by the author

S3EncryptionMode_S3_MANAGED

No description provided by the author

TableEncryption_CLIENT_SIDE_KMS

No description provided by the author

TableEncryption_KMS

No description provided by the author

TableEncryption_KMS_MANAGED

No description provided by the author

TableEncryption_S3_MANAGED

No description provided by the author

TableEncryption_UNENCRYPTED

No description provided by the author

# Structs

CloudWatchEncryption

CloudWatch Logs encryption configuration.

CodeConfig

Result of binding `Code` into a `Job`.

Column

A column of a table.

ConnectionOptions

Base Connection Options.

ConnectionProps

Construction properties for {@link Connection}.

ContinuousLoggingProps

Properties for enabling Continuous Logging for Glue Jobs.

DatabaseProps

Experimental.

DataFormatProps

Properties of a DataFormat instance.

JobAttributes

Attributes for importing {@link Job}.

JobBookmarksEncryption

Job bookmarks encryption configuration.

JobExecutableConfig

Result of binding a `JobExecutable` into a `Job`.

JobProps

Construction properties for {@link Job}.

PythonShellExecutableProps

Props for creating a Python shell job executable.

PythonSparkJobExecutableProps

Props for creating a Python Spark (ETL or Streaming) job executable.

S3Encryption

S3 encryption configuration.

ScalaJobExecutableProps

Props for creating a Scala Spark (ETL or Streaming) job executable.

SecurityConfigurationProps

Constructions properties of {@link SecurityConfiguration}.

SparkUILoggingLocation

The Spark UI logging location.

SparkUIProps

Properties for enabling Spark UI monitoring feature for Spark-based Glue jobs.

TableAttributes

Experimental.

TableProps

Experimental.

Type

Represents a type of a column in a table schema.

# Interfaces

AssetCode

Job Code from a local file.

ClassificationString

Classification string given to tables with this data format.

Code

Represents a Glue Job's Code assets (an asset can be a scripts, a jar, a python file or any other file).

Connection

An AWS Glue connection to a data source.

ConnectionType

The type of the glue connection.

Database

A Glue database.

DataFormat

Defines the input/output formats and ser/de for a single DataFormat.

GlueVersion

AWS Glue version determines the versions of Apache Spark and Python that are available to the job.

IConnection

Interface representing a created or an imported {@link Connection}.

IDatabase

Experimental.

IJob

Interface representing a created or an imported {@link Job}.

InputFormat

Absolute class name of the Hadoop `InputFormat` to use when reading table files.

ISecurityConfiguration

Interface representing a created or an imported {@link SecurityConfiguration}.

ITable

Experimental.

Job

A Glue Job.

JobExecutable

The executable properties related to the Glue job's GlueVersion, JobType and code.

JobType

The job type.

OutputFormat

Absolute class name of the Hadoop `OutputFormat` to use when writing table files.

S3Code

Glue job Code from an S3 bucket.

Schema

See: https://docs.aws.amazon.com/athena/latest/ug/data-types.html Experimental.

SecurityConfiguration

A security configuration is a set of security properties that can be used by AWS Glue to encrypt data at rest.

SerializationLibrary

Serialization library to use when serializing/deserializing (SerDe) table records.

Table

A Glue table.

WorkerType

The type of predefined worker that is allocated when a job runs.

# Type aliases

CloudWatchEncryptionMode

Encryption mode for CloudWatch Logs.

JobBookmarksEncryptionMode

Encryption mode for Job Bookmarks.

JobLanguage

Runtime language of the Glue job.

JobState

Job states emitted by Glue to CloudWatch Events.

MetricType

The Glue CloudWatch metric type.

PythonVersion

Python version.

S3EncryptionMode

Encryption mode for S3.

TableEncryption

Encryption options for a Table.