# README

PullMan

Manage your file pulls with PullMan. The primary use-case is supporting a long-running process that can have remote repositories configured dynamically and concurrent pulls from those repositories handled efficiently.

This project is a work in progress.

API

There is a single method in the functional API:

func (p *PullManager) Pull(ctx context.Context, pc PullCommand) error

The PullCommand contains all the needed information to process a request to pull resources. A PullManager instance is intended to be used concurrently from multiple threads calling Pull().

See Concepts for details.

Example Usage

package main

import (
	"context"
	"fmt"

	"github.com/go-logr/zapr"
	"go.uber.org/zap"

	"github.com/kserve/modelmesh-runtime-adapter/pullman"
	_ "github.com/kserve/modelmesh-runtime-adapter/pullman/storageproviders/http"
)

func main() {
	// set-up the logger
	zaplog, err := zap.NewDevelopment()
	if err != nil {
		panic("Error creating logger...")
	}
	// create a manager
	manager := pullman.NewPullManager(zapr.NewLogger(zaplog))

	// construct the PullCommand
	configJSON := []byte(`{
		"type": "http",
		"url": "http://httpbin.org"
	}`)
	rc := &pullman.RepositoryConfig{}
	_ = json.Unmarshal(configJSON, rc)

	pts := []pullman.Target{
		{
			RemotePath: "uuid",
		},
		{
			RemotePath: "/image/jpeg",
			LocalPath:  "random_image.jpg",
		},
	}

	pc := pullman.PullCommand{
		RepositoryConfig: rc,
		Directory:        "./output",
		Targets:          pts,
	}

	pullErr := manager.Pull(context.Background(), pc)
	if pullErr != nil {
		fmt.Printf("Failed to pull files: %v\n", pullErr)
	}
}

Executing the above code results in two files being downloaded:

  • a random JPEG image at output/random_image.jpg
  • a file containing JSON with a uuid at output/uuid

Concepts

Storage Provider

The StorageProvider interface abstracts creating clients to a remote service that files can be pulled from. For generic providers, a service can be identified by the communication protocol (s3, http, ftp, etc). A StorageProvider is identified by a string type, and available storage providers are typically registered with PullMan at boot-up via an init function:

func init() {
	p := Provider{
		// some configurations for the provider
	}
	pullman.RegisterProvider(providerType, p)
}

This allows a user of PullMan to control what provider implementations it makes available. A StorageProvider is a factory for RepositoryClients and creates them from a provider-specific configuration abstracted as a Config.

Repository Client

A RepositoryClient encapsulates the connections to a remote service and knows how to pull resources from it.

Creating and updating RepositoryClient instances can happen dynamically and asynchronously from pulling any resources. PullMan manages a cache of RepositoryClients based on requests it has processed and will re-use clients where possible.

Pull Command

A PullCommand contains all the needed information for PullMan to process a request to pull resources. Both remote and local resources are identified by paths. LocalPath is a filesystem path, and RemotePath is always composed of segments separated by forward slashes. The RemotePath may point to a single resource or an abstraction pointing to multiple resources (analogous to a directory). The definition of a "directory" may be different for different storage providers but must always be compatible with a filesystem path. For example, an HTTP request that gets a multipart/form-data body as a response could result in writing multiple files when pulling that resource.

// Represents the request to the puller to be fulfilled
type PullCommand struct {
	// repository from which files will be pulled
	RepositoryConfig Config
	// local directory where files will be pulled to
	Directory string
	// the list of targets to be pulled
	Targets []Target
}

type Target struct {
	// remote path to the desired resource(s)
	RemotePath string
	// path to local file to pull the resource to (may have default based on RemotePath)
	LocalPath string
}

# Packages

No description provided by the author
No description provided by the author

# Functions

No description provided by the author
helper functions to have consistent behavior when working with Configs.
HashStrings generates a hash from the concatenation of the passed strings Provides a common way for providers to implement GetKey in the case that some configuration's values are considered secret.
No description provided by the author
No description provided by the author
OpenFile will check the path and the filesystem for mismatch errors.
RegisterProvider should only be called when initializing the application.

# Structs

Represents the command sent to PullMan to be fulfilled.
No description provided by the author
Generic config abstraction used by PullMan.
No description provided by the author

# Interfaces

Config represents simple key/value configuration with a type/class.
A RepositoryClient is the worker that executes a PullCommand.
A StorageProvider is a factory for RepositoryClients.