MPIJob

Reference documentation for MPIJob

Packages:

kubeflow.org

Package v1alpha2 is the v1alpha2 version of the API.

Resource Types:

MPIJob

Represents a MPIJob resource.

Field Description
apiVersion
string
kubeflow.org/v1alpha2
kind
string
MPIJob
metadata
Kubernetes meta/v1.ObjectMeta

Standard Kubernetes object’s metadata.

Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
MPIJobSpec

Specification of the desired state of the MPIJob.



activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the MPIJob completes. Defaults to None.

slotsPerWorker
int32
(Optional)

Specifies the number of slots per worker used in hostfile. Defaults to 1.

mainContainer
string
(Optional)

Specifies name of the main container which executes the MPI code.

runPolicy
common/v1.RunPolicy
(Optional)

Encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

mpiReplicaSpecs
map[github.com/kubeflow/mpi-operator/pkg/apis/kubeflow/v1alpha2.MPIReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of MPIReplicaType (type) to ReplicaSpec (value). Specifies the MPI cluster configuration. For example, { “Launcher”: MPIReplicaSpec, “Worker”: MPIReplicaSpec, }

status
common/v1.JobStatus

Most recently observed status of the MPIJob. Read-only (modified by the system).

MPIJobSpec

(Appears on: MPIJob)

MPIJobSpec is a desired state description of the MPIJob.

Field Description
activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the MPIJob completes. Defaults to None.

slotsPerWorker
int32
(Optional)

Specifies the number of slots per worker used in hostfile. Defaults to 1.

mainContainer
string
(Optional)

Specifies name of the main container which executes the MPI code.

runPolicy
common/v1.RunPolicy
(Optional)

Encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

mpiReplicaSpecs
map[github.com/kubeflow/mpi-operator/pkg/apis/kubeflow/v1alpha2.MPIReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of MPIReplicaType (type) to ReplicaSpec (value). Specifies the MPI cluster configuration. For example, { “Launcher”: MPIReplicaSpec, “Worker”: MPIReplicaSpec, }

MPIReplicaType (string alias)

MPIReplicaType is the type for MPIReplica. Can be one of “Launcher” or “Worker”.