Securing Your Clusters
Alpha
This feature is currently in alpha release status with limited support. The Kubeflow team is interested in any feedback you may have, in particular with regards to usability of the feature. Note the following issues already reported:
This guide describes how to secure Kubeflow using VPC Service Controls and private GKE.
Together these two features signficantly increase security and mitigate the risk of data exfiltration.
-
VPC Service Controls allow you to define a perimeter around Google Cloud Platform (GCP) services.
Kubeflow uses VPC Service Controls to prevent applications running on GKE from writing data to GCP resources outside the perimeter.
-
Private GKE removes public IP addresses from GKE nodes making them inaccessible from the public internet.
Kubeflow uses IAP to make Kubeflow web apps accessible from your browser.
VPC Service Controls allow you to restrict which Google services are accessible from your GKE/Kubeflow clusters. This is an important part of security and in particular mitigating the risks of data exfiltration.
For more information refer to the VPC Service Control Docs.
Creating a private Kubernetes Engine cluster means the Kubernetes Engine nodes won’t have public IP addresses. This can improve security by blocking unwanted outbound/inbound access to nodes. Removing IP addresses means external services (such as GitHub, PyPi, and DockerHub) won’t be accessible from the nodes. Google services (such as BigQuery and Cloud Storage) are still accessible.
Importantly this means you can continue to use your Google Container Registry (GCR) to host your Docker images. Other Docker registries (for example, DockerHub) will not be accessible. If you need to use Docker images hosted outside GCR you can use the scripts provided by Kubeflow to mirror them to your GCR registry.
Before you start
Before installing Kubeflow ensure you have installed the following tools:
You will need to know your gcloud organization ID and project number; you can get them via gcloud.
export PROJECT=<your GCP project id>
export ORGANIZATION_NAME=<name of your organization>
export ORGANIZATION=$(gcloud organizations list --filter=DISPLAY_NAME=${ORGANIZATION_NAME} --format='value(name)')
export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT} --format='value(projectNumber)')
- Projects are identified by names, IDs, and numbers. For more info, see Identifying projects.
Enable VPC Service Controls In Your Project
-
Enable VPC service controls:
export PROJECT=<Your project> gcloud services enable accesscontextmanager.googleapis.com \ cloudresourcemanager.googleapis.com \ dns.googleapis.com --project=${PROJECT}
-
Check if you have an access policy object already created:
gcloud beta access-context-manager policies list \ --organization=${ORGANIZATION}
- An access policy is a GCP resource object that defines service perimeters. There can be only one access policy object in an organization, and it is a child of the Organization resource.
-
If you don’t have an access policy object, create one:
gcloud beta access-context-manager policies create \ --title "default" --organization=${ORGANIZATION}
-
Save the Access Policy Object ID as an environment variable so that it can be used in subsequent commands:
export POLICYID=$(gcloud beta access-context-manager policies list --organization=${ORGANIZATION} --limit=1 --format='value(name)')
-
Create a service perimeter:
gcloud beta access-context-manager perimeters create KubeflowZone \ --title="Kubeflow Zone" --resources=projects/${PROJECT_NUMBER} \ --restricted-services=bigquery.googleapis.com,containerregistry.googleapis.com,storage.googleapis.com \ --project=${PROJECT} --policy=${POLICYID}
-
Here we have created a service perimeter with the name KubeflowZone.
-
The perimeter is created in PROJECT_NUMBER and restricts access to GCS (storage.googleapis.com), BigQuery (bigquery.googleapis.com), and GCR (containerregistry.googleapis.com).
-
Placing GCS (Google Cloud Storage) and BigQuery in the perimeter means that access to GCS and BigQuery resources owned by this project is now restricted. By default, access from outside the perimeter will be blocked
-
More than one project can be added to the same perimeter
-
-
Create an access level to allow Google Container Builder to access resources inside the perimiter:
-
Create a members.yaml file with the following contents
- members: - serviceAccount:${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com - user:<your email>
-
Google Container Builder is used to mirror Kubeflow images into the perimeter
-
Adding your email allows you to access the GCP services inside the perimeter from outside the cluster
- This is convenient for building and pushing images and data from your local machine.
-
For more information refer to the docs.
-
-
Create the access level:
gcloud beta access-context-manager levels create kubeflow \ --basic-level-spec=members.yaml \ --policy=${POLICYID} \ --title="Kubeflow ${PROJECT}"
- The name for the level can’t have any hyphens
-
Bind Access Level to a Service Perimeter:
gcloud beta access-context-manager perimeters update KubeflowZone \ --add-access-levels=kubeflow \ --policy=${POLICYID}
Set up container registry for GKE private clusters:
Follow the step belows to configure your GCR registry to be accessible from your secured clusters. For more info see instructions.
-
Create a managed private zone
export ZONE_NAME=kubeflow export NETWORK=<Network you are using for your cluster> gcloud beta dns managed-zones create ${ZONE_NAME} \ --visibility=private \ --networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \ --description="Kubeflow DNS" \ --dns-name=gcr.io \ --project=${PROJECT}
-
Start a transaction
gcloud dns record-sets transaction start \ --zone=${ZONE_NAME} \ --project=${PROJECT}
-
Add a CNAME record for *.gcr.io
gcloud dns record-sets transaction add \ --name=*.gcr.io. \ --type=CNAME gcr.io. \ --zone=${ZONE_NAME} \ --ttl=300 \ --project=${PROJECT}
-
Add an A record for the restricted VIP
gcloud dns record-sets transaction add \ --name=gcr.io. \ --type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \ --zone=${ZONE_NAME} \ --ttl=300 \ --project=${PROJECT}
-
Commit the transaction
gcloud dns record-sets transaction execute \ --zone=${ZONE_NAME} \ --project=${PROJECT}
Mirror Kubeflow Application Images
Since private GKE can only access gcr.io, we need to mirror all images outside gcr.io for Kubeflow applications. We will use the kfctl
tool to accomplish this.
-
Set your user credentials. You only need to run this command once:
gcloud auth application-default login
-
Inside your
${KFAPP}
directory create a local configuration filemirror.yaml
based on this template- Change destination to your project gcr registry.
-
Generate pipeline files to mirror images by running
cd ${KFAPP} ./kfctl alpha mirror build mirror.yaml -V -o pipeline.yaml --gcb
- If you want to use Tekton rather than Google Cloud Build(GCB) drop
--gcb
to emit a Tekton pipeline - The instructions below assume you are using GCB
- If you want to use Tekton rather than Google Cloud Build(GCB) drop
-
Edit the couldbuild.yaml file
-
In the
images
section add- <registry domain>/<project_id>/docker.io/istio/proxy_init:1.1.6
- Replace
<registry domain>/<project_id>
with your registry
- Replace
-
Under
steps
section add- args: - build - -t - <registry domain>/<project id>/docker.io/istio/proxy_init:1.1.6 - --build-arg=INPUT_IMAGE=docker.io/istio/proxy_init:1.1.6 - . name: gcr.io/cloud-builders/docker waitFor: - '-'
-
Remove the mirroring of cos-nvidia-installer:fixed image. You don’t need it to be replicated because this image is privately available through GKE internal repo.
- Remove the images from the
images
section - Remove it from the
steps
section
- Remove the images from the
-
-
Create a cloud build job to do the mirroring
gcloud builds submit --async gs://kubeflow-examples/image-replicate/replicate-context.tar.gz --project <project_id> --config cloudbuild.yaml
-
Update your manifests to use the mirror’d images
kfctl alpha mirror overwrite -i pipeline.yaml
-
Edit file “kustomize/istio-install/base/istio-noauth.yaml”:
- Replace
docker.io/istio/proxy_init:1.16
togcr.io/<project_id>/docker.io/istio/proxy_init:1.16
- Replace
docker.io/istio/proxyv2:1.1.6
togcr.io/<project_id>/docker.io/istio/proxyv2:1.1.6
- Replace
Deploy Kubeflow with Private GKE
Coming Soon
You can follow the issue: Documentation on how to use Kubeflow with private GKE and VPC service controlsNext steps
- Use GKE Authorized Networks to restrict access to your GKE master
- Learn more about VPC Service Controls
- See how to delete your Kubeflow deployment using the CLI.
- Troubleshoot any issues you may find.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.