Categorygithub.com/hashicorp/go-extract
modulepackage
1.1.0
Repository: https://github.com/hashicorp/go-extract.git
Documentation: pkg.go.dev

# README

go-extract

Perform tests on unix and windows GoDoc License: MPL-2.0

This library provides secure decompression and extraction for formats like 7-Zip, Brotli, Bzip2, GZip, LZ4, Rar (excluding symlinks), Snappy, Tar, Xz, Zip, Zlib, and Zstandard. It safeguards against resource exhaustion, path traversal, and symlink attacks. Additionally, it offers various configuration options and collects telemetry data during extraction.

Installation Instructions

Add hashicorp/go-extract as a dependency to your project:

go get github.com/hashicorp/go-extract

Build hashicorp/go-extract from source and install it to the system as a command-line utility:

git clone [email protected]:hashicorp/go-extract.git
cd go-extract
make
make test
make install

Install hashicorp/go-extract directly from GitHub:

go install github.com/hashicorp/go-extract/cmd/goextract@latest

Usage Examples

These examples demonstrate how to use hashicorp/go-extract both as a library and as a command-line utility.

Library

The simplest way to use the library is to call the extract.Unpack function with the default configuration. This function extracts the contents from an io.Reader to the specified destination on the local filesystem.

// Unpack the archive
if err := extract.Unpack(ctx, dst, archive, config.NewConfig()); err != nil {
    // Handle error
    log.Fatalf("Failed to unpack archive: %v", err)
}

Command-line Utility

The goextract command-line utility offers all available configuration options via dedicated flags.

$ goextract -h
Usage: goextract <archive> [<destination>] [flags]

A secure extraction utility

Arguments:
  <archive>          Path to archive. ("-" for STDIN)
  [<destination>]    Output directory/file.

Flags:
  -h, --help                               Show context-sensitive help.
  -C, --continue-on-error                  Continue extraction on error.
  -S, --continue-on-unsupported-files      Skip extraction of unsupported files.
  -c, --create-destination                 Create destination directory if it does not exist.
      --custom-create-dir-mode=750         File mode for created directories, which are not listed in the archive. (respecting umask)
      --custom-decompress-file-mode=640    File mode for decompressed files. (respecting umask)
  -D, --deny-symlinks                      Deny symlink extraction.
  -d, --drop-file-attributes               Drop file attributes (mode, modtime, access time).
      --insecure-traverse-symlinks         Traverse symlinks to directories during extraction.
      --max-files=100000                   Maximum files (including folder and symlinks) that are extracted before stop. (disable check: -1)
      --max-extraction-size=1073741824     Maximum extraction size that allowed is (in bytes). (disable check: -1)
      --max-extraction-time=60             Maximum time that an extraction should take (in seconds). (disable check: -1)
      --max-input-size=1073741824          Maximum input size that allowed is (in bytes). (disable check: -1)
  -N, --no-untar-after-decompression       Disable combined extraction of tar.gz.
  -O, --overwrite                          Overwrite if exist.
  -P, --pattern=PATTERN,...                Extracted objects need to match shell file name pattern.
  -p, --preserve-owner                     Preserve owner and group of files from archive (only root/uid:0 on unix systems for tar files).
  -T, --telemetry                          Print telemetry data to log after extraction.
  -t, --type=""                            Type of archive. (7z, br, bz2, gz, lz4, rar, sz, tar, tgz, xz, zip, zst, zz)
  -v, --verbose                            Verbose logging.
  -V, --version                            Print release version information.

Configuration

When calling the extract.Unpack(..) function, we need to provide config object that contains all available configuration.

  cfg := extract.NewConfig(
    extract.WithContinueOnError(..),
    extract.WithContinueOnUnsupportedFiles(..),
    extract.WithCreateDestination(..),
    extract.WithCustomCreateDirMode(..),
    extract.WithCustomDecompressFileMode(..),
    extract.WithDenySymlinkExtraction(..),
    extract.WithDropFileAttributes(..),
    extract.WithExtractType(..),
    extract.WithInsecureTraverseSymlinks(..),
    extract.WithLogger(..),
    extract.WithMaxExtractionSize(..),
    extract.WithMaxFiles(..),
    extract.WithMaxInputSize(..),
    extract.WithNoUntarAfterDecompression(..),
    extract.WithOverwrite(..),
    extract.WithPatterns(..),
    extract.WithPreserveOwner(..),
    extract.WithTelemetryHook(..),
  )

[..]

  if err := extract.Unpack(ctx, dst, archive, cfg); err != nil {
    log.Println(fmt.Errorf("error during extraction: %w", err))
    os.Exit(-1)
  }

Telemetry

Telemetry data can be collected by specifying a telemetry hook in the configuration. This hook receives the collected telemetry data at the end of each extraction.

// create new config
cfg := NewConfig(
  WithTelemetryHook(func(ctx context.Context, m *telemetry.Data) {
    // handle telemetry data
  }),
)

Here is an example collected telemetry data for the extraction of terraform-aws-iam-5.34.0.tar.gz:

{
  "last_extraction_error": "",
  "extracted_dirs": 51,
  "extraction_duration": 55025584,
  "extraction_errors": 0,
  "extracted_files": 241,
  "extraction_size": 539085,
  "extracted_symlinks": 0,
  "extracted_type": "tar.gz",
  "input_size": 81477,
  "pattern_mismatches": 0,
  "unsupported_files": 0,
  "last_unsupported_file": ""
}

Extraction targets

Disk

Interact with the local operating system to create files, directories, and symlinks. Extracted entries can be accessed later using the os.* API calls.

// prepare destination and config
d := extract.NewTargetDisk()
dst := "output/"
cfg := config.NewConfig()

// unpack
if err := extract.UnpackTo(ctx, d, dst, archive, cfg); err != nil {
    // handle error
}

// Walk the local filesystem
localFs := os.DirFS(dst)
if err := fs.WalkDir(localFs, ".", func(path string, d fs.DirEntry, err error) error {
    // process path, d and err
    return nil
}); err != nil {
    // handle error
}

Memory

Extract archives directly into memory, supporting files, directories, and symlinks. Note that file permissions are not validated. Access the extracted entries by converting the target to io/fs.FS.

// prepare destination and config
m   = extract.NewMemory()     // create a new in-memory filesystem
dst = ""                      // root of in-memory filesystem
cfg = extract.NewConfig()     // custom config for extraction

// unpack
if err := extract.UnpackTo(ctx, m, dst, archive, cfg); err != nil {
    // handle error
}

// Walk the memory filesystem
if err := fs.WalkDir(m, ".", func(path string, d fs.DirEntry, err error) error {
    fmt.Println(path)
    return nil
}); err != nil {
    fmt.Printf("failed to walk memory filesystem: %s", err)
    return
}

Errors

If the extraction fails, you can check for specific errors returned by the extract.Unpack function:

if err := extract.Unpack(ctx, dst, archive, cfg); err != nil {
  switch {
  case errors.Is(err, extract.ErrNoExtractorFound):
    // handle no extractor found
  case errors.Is(err, extract.ErrUnsupportedFileType):
    // handle unsupported file type
  case errors.Is(err, extract.ErrFailedToReadHeader):
    // handle failed to read header
  case errors.Is(err, extract.ErrFailedToUnpack):
    // handle failed to unpack
  default:
    // handle other error
  }
}

# Packages

No description provided by the author

# Functions

HasKnownArchiveExtension returns true if the given name has a known archive extension.
NewConfig is a generator option that takes opts as adjustments of the default configuration in an option pattern style.
NewTargetDisk creates a new Os and applies provided options from opts.
NewTargetMemory creates a new in-memory filesystem.
Unpack unpacks the given source to the destination, according to the given configuration, using the default OS If cfg is nil, the default configuration is used for extraction.
UnpackTo unpacks the given source to the destination, according to the given configuration, using the given [Target].
WithCacheInMemory options pattern function to enable/disable caching in memory.
WithContinueOnError options pattern function to continue on error during extraction.
WithContinueOnUnsupportedFiles options pattern function to enable/disable skipping unsupported files.
WithCreateDestination options pattern function to create destination directory if it does not exist.
WithCustomCreateDirMode options pattern function to set the file mode for created directories, that are not defined in the archive.
WithCustomDecompressFileMode options pattern function to set the file mode for a decompressed file.
WithDenySymlinkExtraction options pattern function to deny symlink extraction.
WithDropFileAttributes options pattern function to drop the file attributes of the extracted files.
WithExtractType options pattern function to set the extraction type in the [Config].
WithInsecureTraverseSymlinks options pattern function to traverse symlinks during extraction.
WithLogger options pattern function to set a custom logger.
WithMaxExtractionSize options pattern function to set maximum size over all decompressed and extracted files.
WithMaxFiles options pattern function to set maximum number of extracted, files, directories and symlinks during the extraction.
WithMaxInputSize options pattern function to set MaxInputSize for extraction input file.
WithNoUntarAfterDecompression options pattern function to enable/disable combined tar.gz extraction.
WithOverwrite options pattern function specify if files should be overwritten in the destination.
WithPatterns options pattern function to set filepath pattern, that files need to match to be extracted.
WithPreserveOwner options pattern function to preserve the owner of the extracted files.
WithTelemetryHook options pattern function to set a [telemetry.TelemetryHook], which is called after extraction.

# Variables

ErrFailedToReadHeader is returned when the header of the file cannot be read.
ErrFailedToExtract is returned when the file cannot be extracted.
ErrMaxExtractionSizeExceeded indicates that the maximum size is exceeded.
ErrMaxFilesExceeded indicates that the maximum number of files is exceeded.
ErrNoExtractorFound is returned when no extractor is found for the given file type.
ErrUnsupportedFile is an error that indicates that the file is not supported.
ErrUnsupportedFileType is returned when the file type is not supported.

# Structs

Config provides a configuration struct and options to adjust the configuration.
TargetDisk is the struct type that holds all information for interacting with the filesystem.
TargetMemory is an in-memory filesystem implementation that can be used to create, read, and write files in memory.
TelemetryData holds all telemetry data of an extraction.
UnsupportedFileError is an error that indicates that the file is not supported.

# Interfaces

Target specifies all function that are needed to be implemented to extract contents from an archive.

# Type aliases

ConfigOption is a function pointer to implement the option pattern.
TelemetryHook is a function type that performs operations on [TelemetryData] after an extraction has finished which can be used to submit the [TelemetryData] to a telemetry service, for example.