# Constants
AllModelJob defines all model job.
AllServingJob represents all serving job type.
AllTrainingJob represents all job types.
CronTFTrainingJob defines the cron tfjob.
CustomServingJob defines the custom serving job.
DeepSpeedTrainingJob defines the deepspeed job.
DistributedServingJob defines the distributed serving job.
ETTrainingJob defines the etjob.
EvaluateJob defines the tensorflow serving job.
HorovodTrainingJob defines the horovod job.
JobCreated means the job has been accepted by the system, but one or more of the pods/services has not been started.
JobFailed means one or more sub-resources (e.g.
JobRestarting means one or more sub-resources (e.g.
JobRunning means all sub-resources (e.g.
JobSucceeded means all sub-resources (e.g.
Json defines the json format.
KFServingJob defines the kfserving job.
KServeJob defines the kserve job.
ModelBenchmarkJob defines the model benchmark job.
ModelEvaluateJob defines the model evaluate job.
ModelJobComplete means the job is complete.
ModelJobFailed means the job is failed.
ModelJobPending means the job is pending.
ModelJobRunning means the job is running.
ModelJobUnknown means the job status is unknown.
ModelOptimizeJob defines the model optimize job.
ModelProfileJob defines the model profile job.
MPITrainingJob defines the mpijob.
defines the nvidia resource name.
PytorchTrainingJob defines the pytorchjob.
RayJob defines the ray job.
SeldonServingJob defines the seldon core job.
SparkTrainingJob defines the spark job.
TFServingJob defines the tensorflow serving job.
TFTrainingJob defines the tfjob.
TrainingJobFailed means the job is failed.
TrainingJobPending means the job is pending.
TrainingJobQueuing means the job is queuing.
TrainingJobRunning means the job is running.
TrainingJobSucceeded means the job is Succeeded.
TritonServingJob defines the nvidia triton server job.
TRTServingJob defines the tensorrt serving job.
Unknown defines the unknown format.
UnknownModelJob defines the unknown model job.
UnknownServingJob defines the unknown serving job.
UnknownTrainingJob defines the unknown training.
VolcanoTrainingJob defines the volcano job.
Wide defines the wide format.
Yaml defines the yaml format.
# Variables
ModelTypeMap collects model job type and their alias.
ServingTypeMap collects serving job type and their alias.
ServingTypeMap collects serving job type and their alias.
# Structs
AutoscalerOptions specifies optional configuration for the Ray autoscaler.
CommonSubmitArgs defines the common parts of the submitAthd.
ConfigFileInfo defines the config files which will be mounted to containers.
DataDirVolume defines the volume of kubernetes.
HeadGroupSpec are the spec for the head pod.
LimitedPodSecurityContext defines the kubernetes pod security context.
PrometheusServer is used to define prometheus server.
Model Management.
ServingJobInfo display serving job information.
SubmitTensorboardArgs is used to store tensorborad information.
TrainingJobInfo stores training job information.
TrainingJobInstance defines the instance of training job.
WorkerGroupSpec are the specs for the worker pods.
# Type aliases
ConcurrencyPolicy describes how the job will be handled.
CronType defines the supporting job type.
PrintFormatStyle defines the format of output it only used in cmd.
JobConditionType defines all kinds of types of JobStatus.
ModelJobStatus defines all the kinds of JobStatus.
ModelJobType defines the supporting model job type.
key of map is device id.
ServingJobType defines the serving job type name must like shorthand + "-serving".
TrainingJobStatus defines all the kinds of JobStatus.
TrainingJobType defines the supporting training job type.