Authenticating Pipelines to GCP

Authentication and authorization to Google Cloud Platform (GCP) in Pipelines

This page describes authentication for Kubeflow Pipelines to GCP. Available options listed below have different tradeoffs. You should choose the one that fits your use-case.

Compute Engine default service account is easy to set up, but overgrants permission if enabled access to “cloud-platform” scope. Therefore, it is not suitable for a shared GCP project.
Workload Identity takes more efforts to set up, but allows fine-grained permission control. It is recommended for production use-cases.
Google service account keys stored as Kubernetes secrets is the legacy approach and no longer recommended in GKE. However, it’s the only option to use GCP APIs when your cluster is an anthos or on-prem cluster.

NOTE: AI Platform Pipelines only supports Compute Engine default service account out of the box. If you want custom configurations, recommend using Pipelines Standalone instead. For details, please refer to AI Platform Pipelines documentation.

Before you begin

Installation Options for Kubeflow Pipelines introduces options to install Pipelines. Be aware that authentication support and cluster setup instructions will vary depending on the option you installed Kubeflow Pipelines with.

Compute Engine default service account

This is good for trying out Kubeflow Pipelines, because it is easy to set up, but does not support permission separation for workloads in the cluster.

NOTE: Using pipelines with Compute Engine default service account is not supported in Full Kubeflow deployment.

Cluster setup to use Compute Engine default service account

By default, your GKE nodes use Compute Engine default service account. If you allowed cloud-platform scope when creating the cluster, Kubeflow Pipelines can authenticate to GCP and manage resources in your project without further configuration.

Use one of the following options to create a GKE cluster that uses the Compute Engine default service account:

If you followed instructions in Setting up AI Platform Pipelines and checked Allow access to the following Cloud APIs, your cluster is already using Compute Engine default service account.
In Google Cloud Console UI, you can enable it in Create a Kubernetes cluster -> default-pool -> Security -> Accesss Scopes -> Allow full access to all Cloud APIs like the following:
Using gcloud CLI, you can enable it with --scopes cloud-platform like the following:

gcloud container clusters create cluster-name \
  --scopes cloud-platform

Please refer to gcloud container clusters create command documentation for other available options.

Authoring pipelines to use default service account

Pipelines don’t need any specific changes to authenticate to GCP, it will use the default service account transparently.

However, you must update existing pipelines that use the use_gcp_secret kfp sdk operator. Remove the use_gcp_secret usage to let your pipeline authenticate to Google Cloud using the default service account.

Securing the cluster with fine-grained GCP permission control

Workload Identity

Workload Identity is the recommended way for your GKE applications to consume services provided by Google APIs. You accomplish this by configuring a Kubernetes service account to act as a Google service account. Any Pods running as the Kubernetes service account then use the Google service account to authenticate to cloud services.

Referenced from Workload Identity Documentation. Please read this doc for:

A detailed introduction to Workload Identity.
Instructions to enable it on your cluster.
Whether its limitations affect your adoption.

Terminology

This document distinguishes between Kubernetes service accounts (KSAs) and Google service accounts (GSAs). KSAs are Kubernetes resources, while GSAs are specific to Google Cloud. Other documentation usually refers to both of them as just “service accounts”.

Authoring pipelines to use Workload Identity

Pipelines don’t need any specific changes to authenticate to Google Cloud. With Workload Identity, pipelines run as the Google service account that is bound to the KSA.

However, existing pipelines that use use_gcp_secret kfp sdk operator need to remove the use_gcp_secret usage to use the bound GSA. You can also continue to use use_gcp_secret in a cluster with Workload Identity enabled and use_gcp_secret will take precedence for those workloads.

Cluster setup to use Workload Identity for Pipelines Standalone

1. Create your cluster with Workload Identity enabled

In Google Cloud Console UI, you can enable Workload Identity in Create a Kubernetes cluster -> Security -> Enable Workload Identity like the following:
Using gcloud CLI, you can enable it with:

gcloud beta container clusters create cluster-name \
  --release-channel regular \
  --workload-pool=project-id.svc.id.goog

References:

2. Deploy Kubeflow Pipelines

Deploy via Pipelines Standalone as usual.

3. Bind Workload Identities for KSAs used by Kubeflow Pipelines

The following helper bash scripts bind Workload Identities for KSAs used by Kubeflow Pipelines:

gcp-workload-identity-setup.sh helps you create GSAs and bind them to KSAs used by pipelines workloads. This script provides an interactive command line dialog with explanation messages.
wi-utils.sh alternatively provides minimal utility bash functions that let you customize your setup. The minimal utilities make it easy to read and use programmatically.

For example, to get a default setup using gcp-workload-identity-setup.sh, you can

$ curl -O https://raw.githubusercontent.com/kubeflow/pipelines/master/manifests/kustomize/gcp-workload-identity-setup.sh
$ chmod +x ./gcp-workload-identity-setup.sh
$ ./gcp-workload-identity-setup.sh
# This prints the command's usage example and introduction.
# Then you can run the command with required parameters.
# Command output will tell you which GSAs and Workload Identity bindings have been
# created.

4. Configure IAM permissions of used GSAs

If you used gcp-workload-identity-setup.sh to bind Workload Identities for your cluster, you can simply add the following IAM bindings:

Give GSA <cluster-name>-kfp-system@<project-id>.iam.gserviceaccount.com Storage Object Viewer role to let UI load data in GCS in the same project.
Give GSA <cluster-name>-kfp-user@<project-id>.iam.gserviceaccount.com any permissions your pipelines need. For quick tryouts, you can give it Project Editor role for all permissions.

If you configured bindings by yourself, here are GCP permission requirements for KFP KSAs:

Pipelines use pipeline-runner KSA. Configure IAM permissions of the GSA bound to this KSA to allow pipelines use GCP APIs.
Pipelines UI uses ml-pipeline-ui KSA. Pipelines Visualization Server uses ml-pipeline-visualizationserver KSA. If you need to view artifacts and visualizations stored in Google Cloud Storage (GCS) from pipelines UI, you should add Storage Object Viewer permission (or the minimal required permission) to their bound GSAs.

Cluster setup to use Workload Identity for Full Kubeflow

Public Kubeflow v1.0.1 release hasn’t been configured out of box. The fix has been merged, but not yet released publically. After the fix is released, if you deployed Kubeflow following the GCP instructions Workload Identity will have already been configured properly for Kubeflow Pipelines.

If you want to use Workload Identity with pipelines on Kubeflow v1.0.1 or before, I recommend running the following commands to patch your deployment:

export NAMESPACE=kubeflow # Replace with your kubeflow's namespace if it's been customized.
kubectl patch deployment -n ${NAMESPACE} ml-pipeline --patch '{"spec": {"template": {"spec": {"containers": [{"name": "ml-pipeline-api-server", "env": [{"name": "DEFAULTPIPELINERUNNERSERVICEACCOUNT", "value": "kf-user"}]}]}}}}'
kubectl patch clusterrolebinding -n ${NAMESPACE} pipeline-runner --patch '{"subjects": [{"kind": "ServiceAccount", "name": "kf-user", "namespace": "'$NAMESPACE'"}]}'

Pipelines use kf-user KSA by default which is different from Kubeflow Standalone.

Google service account keys stored as Kubernetes secrets

It is recommended to use Workload Identity for easier and secure management, but you can also choose to use GSA keys.

Authoring pipelines to use GSA keys

Each pipeline step describes a container that is run independently. If you want to grant access for a single step to use one of your service accounts, you can use kfp.gcp.use_gcp_secret(). Examples for how to use this function can be found in the Kubeflow examples repo.

Cluster setup to use use_gcp_secret for Full Kubeflow

You don’t need to do anything. Full Kubeflow deployment has already deployed the user-gcp-sa secret for you.

Cluster setup to use use_gcp_secret for Pipelines Standalone

Pipelines Standalone require your manual setup for the user-gcp-sa secret used by use_gcp_secret.

Instructions to set up the secret:

First download the GCE VM service account token (refer to GCP documentation for more information):

gcloud iam service-accounts keys create application_default_credentials.json \
  --iam-account [SA-NAME]@[PROJECT-ID].iam.gserviceaccount.com

Run:

kubectl create secret -n [your-namespace] generic user-gcp-sa \
  --from-file=user-gcp-sa.json=application_default_credentials.json

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 21.04.2020: Restructured the website repo to allow for future i18n and content translation (#1909) (d0bd0e03)

You are viewing documentation for Kubeflow 1.0