Categorygithub.com/hashicorp/go-extract
modulepackage
0.7.1
Repository: https://github.com/hashicorp/go-extract.git
Documentation: pkg.go.dev

# README

go-extract

Perform tests on unix and windows Security Scanner Heimdall

Secure file decompression and extraction of following types:

  • 7-Zip
  • Brotli
  • Bzip2
  • GZip
  • LZ4
  • Snappy
  • Tar
  • Xz
  • Zip
  • Zlib
  • Zstandard

Code Example

Add to go.mod:

GOPRIVATE=github.com/hashicorp/go-extract go get github.com/hashicorp/go-extract

Usage in code:


import (
    ...
    "github.com/hashicorp/go-extract"
    "github.com/hashicorp/go-extract/config"
    "github.com/hashicorp/go-extract/telemetry"
    ...
)

...


    // open archive
    archive, _ := os.Open(...)

    // prepare context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), (time.Second * time.Duration(MaxExtractionTime)))
    defer cancel()

    // prepare logger
    logger := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
      Level: slog.LevelInfo,
    }))

    // setup telemetry hook
    telemetryToLog := func(ctx context.Context, td telemetry.Data) {
      logger.Info("extraction finished", "telemetryData", td)
    }

    // prepare config (these are the default values)
    config := config.NewConfig(
        config.WithCacheInMemory(false),              // cache to disk if input is a zip in a stream
        config.WithContinueOnError(false),            // fail on error
        config.WithContinueOnUnsupportedFiles(false), // don't on unsupported files
        config.WithCreateDestination(false),          // do not try to create specified destination
        config.WithCustomCreateDirMode(0750),         // for not in-archive listed folders (respecting umask), default: drwxr-x--- 
        config.WithCustomDecompressFileMode(0640),    // for decompressed files (respecting umask), default: -rw-r----- 
        config.WithDenySymlinkExtraction(false),      // allow symlink creation
        config.WithExtractType("<ext>")               // specify explicitly a file extension to determine extractor
        config.WithFollowSymlinks(false),             // do not follow symlinks during creation
        config.WithLogger(logger),                    // adjust logger (default: io.Discard)
        config.WithMaxExtractionSize(1 << (10 * 3)),  // limit to 1 Gb (disable check: -1)
        config.WithMaxFiles(1000),                    // only 1k files (including folders and symlinks) maximum (disable check: -1)
        config.WithMaxInputSize(1 << (10 * 3)),       // limit to 1 Gb (disable check: -1)
        config.WithNoUntarAfterDecompression(false),  // extract tar.gz combined
        config.WithOverwrite(false),                  // don't replace existing files
        config.WithPatterns("*.tf","modules/*.tf"),   // normally, no patterns predefined
        config.WithTelemetryHook(telemetryToLog),     // adjust hook to receive telemetry from extraction
    )

    // extract archive
    if err := extract.Unpack(ctx, archive, destinationPath, config); err != nil {
      // handle error
    }

...

[!TIP] If the library is used in a cgroup memory limited execution environment to extract Zip archives that are cached in memory (config.WithCacheInMemory(true)), make sure that GOMEMLIMIT is set in the execution environment to avoid OOM error.

Example:

$ export GOMEMLIMIT=1GiB

CLI Tool

You can use this library on the command line with the goextract command.

Installation

GOPRIVATE=github.com/hashicorp/go-extract go install github.com/hashicorp/go-extract/cmd/goextract@latest

Manual Build and Installation

git clone [email protected]:hashicorp/go-extract.git
cd go-extract
make
make test
make install

Usage

$ goextract -h
Usage: goextract <archive> [<destination>] [flags]

A secure extraction utility

Arguments:
  <archive>          Path to archive. ("-" for STDIN)
  [<destination>]    Output directory/file.

Flags:
  -h, --help                               Show context-sensitive help.
  -C, --continue-on-error                  Continue extraction on error.
  -S, --continue-on-unsupported-files      Skip extraction of unsupported files.
  -c, --create-destination                 Create destination directory if it does not exist.
      --custom-create-dir-mode=750         File mode for created directories, which are not listed in the archive. (respecting umask)
      --custom-decompress-file-mode=640    File mode for decompressed files. (respecting umask)
  -D, --deny-symlinks                      Deny symlink extraction.
  -F, --follow-symlinks                    [Dangerous!] Follow symlinks to directories during extraction.
      --max-files=1000                     Maximum files that are extracted before stop. (disable check: -1)
      --max-extraction-size=1073741824     Maximum extraction size that allowed is (in bytes). (disable check: -1)
      --max-extraction-time=60             Maximum time that an extraction should take (in seconds). (disable check: -1)
      --max-input-size=1073741824          Maximum input size that allowed is (in bytes). (disable check: -1)
  -N, --no-untar-after-decompression       Disable combined extraction of tar.gz.
  -O, --overwrite                          Overwrite if exist.
  -P, --pattern=PATTERN,...                Extracted objects need to match shell file name pattern.
  -T, --telemetry                          Print telemetry data to log after extraction.
  -t, --type=""                            Type of archive. (7z, br, bz2, gz, lz4, sz, tar, tgz, xz, zip, zst, zz)
  -v, --verbose                            Verbose logging.
  -V, --version                            Print release version information.

Telemetry data

It is possible to collect telemetry data ether by specifying a telemetry hook via the config option config.WithTelemetryHook(telemetryToLog) or as a cli parameter -T, --telemetry.

Here is an example collected telemetry data for the extraction of terraform-aws-iam-5.34.0.tar.gz:

{
  "LastExtractionError": "",
  "ExtractedDirs": 51,
  "ExtractionDuration": 48598584,
  "ExtractionErrors": 0,
  "ExtractedFiles": 241,
  "ExtractionSize": 539085,
  "ExtractedSymlinks": 0,
  "ExtractedType": "tar+gzip",
  "InputSize": 81477,
  "PatternMismatches": 0,
  "UnsupportedFiles": 0,
  "LastUnsupportedFile": ""
}

Feature collection

  • Filetypes
    • zip (/jar)
    • tar
    • gzip
    • tar.gz
    • brotli
    • bzip2
    • flate
    • xz
    • snappy
    • rar
    • 7zip
    • zstandard
    • zlib
    • lz4
  • extraction size check
  • max num of extracted files
  • extraction time exhaustion
  • input file size limitations
  • context based cancelation
  • option pattern for configuration
  • io.Reader as source
  • symlink inside archive
  • symlink to outside is detected
  • symlink with absolute path is detected
  • file with path traversal is detected
  • file with absolute path is detected
  • filetype detection based on magic bytes
  • windows support
  • tests for gzip
  • function documentation
  • check for windows
  • Allow/deny symlinks in general
  • Telemetry call back function
  • Extraction filter with unix file name patterns
  • Cache input on disk (only relevant if <archive> is a zip archive, which read from a stream)
  • Cache alternatively optional input in memory (similar to caching on disk, only relevant for zip archives that are consumed from a stream)
  • Handle passwords
  • recursive extraction
  • virtual fs as target

References

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Functions

GetUnpackFunction identifies the correct extractor based on magic bytes.
GetUnpackFunctionByFileName identifies the correct extractor based on file extension.
IsKnownArchiveFileExtension checks if the given file extension is a known archive file extension.
Unpack reads data from src, identifies if its a known archive type.
ValidTypes returns a string with all available types.

# Constants

Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.
Available file types.

# Interfaces

Extractor is an interface and defines all functions that needs to be implemented by an extraction engine.