Managing Kubernetes resources in Terraform: Kubernetes provider

October 20, 2023

title

Using Terraform to manage resources in Kubernetes has the following benefits, when compared to a GitOps solution such as Argo CD or Flux:

All infrastructure is managed with one tool. Teams that already use Terraform don't have to learn how to install, operate and maintain a separate tool to manage applications inside the Kubernetes cluster.
Changes can be done in one commit. For example, the provisioning of a database, saving the connection details in the cluster and then deploying application code that connects to the database.
Faster disaster recovery. In the worst case, we can recover everything locally using the Terraform CLI.

Hashicorp has two official Terraform providers related to managing Kubernetes resources: The Kubernetes provider and the Helm provider. In this blog post I'll focus on how to use the Kubernetes provider, provide examples and show pros/cons at the end of this post.

The examples were written using Kubernetes 1.26 and Terraform 1.5.

Getting Started

The Kubernetes provider for Terraform provides resources and data sources for most of the Kubernetes APIs. For example, the Terraform equivalent of a Kubernetes Deployment is the kubernetes_deployment resource. All of them can be seen in the providers documentation sidebar, grouped by API.

For resources that are not part of the default Kubernetes API, we need to use the kubernetes_manifest resource, which can be the HCL representation of any Kubernetes YAML manifest.

The following examples show the same Kubernetes Deployment in YAML, Terraform kubernetes_deployment and Terraform kubernetes_manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

The same deployment In Terraform HCL using the Kubernetes provider kubernetes_deployment resource:

resource "kubernetes_deployment" "nginx" {
  metadata {
    name      = "nginx"
  }

  spec {
    replicas = 2

    selector {
      match_labels = {
        app = "nginx"
      }
    }

    template {
      metadata {
        labels = {
          app = "nginx"
        }
      }

      spec {
        container {
          image = "nginx:1.25.2-alpine"
          name  = "nginx"

          port {
            container_port = 80
          }
        }
      }
    }
  }
}

Alternatively, we can also use the kubernetes_manifest resource, which can be the HCL representation of any Kubernetes YAML manifest:

resource "kubernetes_manifest" "deployment_nginx_deployment" {
  manifest = {
    "apiVersion" = "apps/v1"
    "kind" = "Deployment"
    "metadata" = {
      "labels" = {
        "app" = "nginx"
      }
      "name" = "nginx"
    }
    "spec" = {
      "replicas" = 2
      "selector" = {
        "matchLabels" = {
          "app" = "nginx"
        }
      }
      "template" = {
        "metadata" = {
          "labels" = {
            "app" = "nginx"
          }
        }
        "spec" = {
          "containers" = [
            {
              "image" = "nginx:1.25.2-alpine"
              "name" = "nginx"
              "ports" = [
                {
                  "containerPort" = 80
                },
              ]
            },
          ]
        }
      }
    }
  }
}

Both of the HCL versions result in the same deployment, but using kubernetes_deployment is less verbose and Terraform will do basic validation on the values (like checking if replicas is an integer). But as mentioned above, for custom resources, we have no other option than using kubernetes_manifest.

Rather than just a single Deployment, we're going to install Istio as a real-world example. It requires custom resource definitions (CRDs) and various resources (RBAC, ServiceAccounts, ConfigMaps etc.) for the istio daemon to be deployed.

Installing Istio

As most Kubernetes resources are distributed in YAML, the first step is always the conversion to Terraform HCL. The Istio YAML manifests for the default profile are around 10,000 lines long and contain 47 Kubernetes resources. Doing the conversion manually would take too long.

To automate the conversion, we can use tfk8s. Here are the commands to generate the Istio YAML manifests and convert them to HCL:

$ istioctl manifest generate > istio.yaml
$ tfk8s -f istio.yaml > istio.tf

CRDs module

A good practice when deploying applications that have CRDs is to put them into its own Terraform module. When applying the Kubernetes provider does not make a difference between custom resources and core resources, which could lead to the case where it tries to deploy a custom resource when the definition hasn't been installed yet.

For our Istio installation we have to (manually) split the istio.tf file into two files, where one of them contains the CRDs. We put them into their own Terraform modules: istio and istio-crds.

The directory tree should look like this:

.
├── istio
│   └── main.tf
├── istio-crds
│   └── main.tf
├── main.tf

In the root main.tf file, we can add a dependency between them, so that the CRDs will be installed first:

module "istio-crds" {
  source = "./istio-crds"
}

module "istio" {
  source = "./istio"

  depends_on = [
    module.istio-crds
  ]
}

We also need to add the namespace to the main/istio.tf file, as it is not created automatically:

resource "kubernetes_namespace" "istio_system" {
  metadata {
    name = "istio-system"
  }
}

Running a terraform apply at this point will not succeed and show many errors. I've not included them in this post because the output is too long, but they can be grouped into the following two main issues.

Inconsistent result error

Error: Provider produced inconsistent result after apply

When applying changes to kubernetes_manifest.deployment_istio_system_istiod, provider
"provider[\"registry.terraform.io/hashicorp/kubernetes\"]" produced an unexpected new value:
.object.spec.template.spec.containers[0].resources.requests["memory"]: was cty.StringVal("2048Mi"), but now
cty.StringVal("2Gi").

This is a bug in the provider, which should be reported in the provider's own issue tracker.

The Deployment specifies a memory request of 2048Mi and the Kubernetes API reports it back as 2Gi, to make it easier to read. The Kubernetes provider does not handle this case, so the fix is to change the value in the istio.tf file to be 2Gi.

In the same file, there are also two other cases where the value needs to be changed. The memory from 1024Mi to 1Gi, and the CPU from 2000m to 2.

Null value conversion error

Error: API response status: Failure

  with kubernetes_manifest.deployment_istio_system_istio_ingressgateway,
  on istio.tf line 14211, in resource "kubernetes_manifest" "deployment_istio_system_istio_ingressgateway":
14211: resource "kubernetes_manifest" "deployment_istio_system_istio_ingressgateway" {

Deployment.apps "istio-ingressgateway" is invalid:
spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms: Required value: must
have at least one node selector term

The problem is with the conversion of null values. For example, this YAML:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    preferredDuringSchedulingIgnoredDuringExecution:

which was converted to:

"affinity" = {
  "nodeAffinity" = {
    "preferredDuringSchedulingIgnoredDuringExecution" = null
    "requiredDuringSchedulingIgnoredDuringExecution" = null
  }
}

The original YAML file will apply successfully with kubectl because it removes fields with empty values. But the Kubernetes provider for Terraform doesn't work in the same way. When we set the value to null, it will send the empty field to the Kubernetes API, resulting in the above error.

To fix it we have to remove all keys with null values from the HCL file. I've submitted a feature request to tfk8s to remove them automatically.

After the removal, we can apply successfully:

$ terraform apply

**Apply complete! Resources: 48 added, 0 changed, 0 destroyed.**

Re-apply issues

But when we run plan again it will show us two changes, even if we didn't change anything:

Terraform will perform the following actions:

  # module.istio.kubernetes_manifest.service_istio_system_istio_ingressgateway will be updated in-place
  ~ resource "kubernetes_manifest" "service_istio_system_istio_ingressgateway" {
      ~ object   = {
          ~ metadata   = {
              + annotations                = (known after apply)
                name                       = "istio-ingressgateway"
                # (13 unchanged attributes hidden)
            }
            # (3 unchanged attributes hidden)
        }
        # (1 unchanged attribute hidden)
    }

  # module.istio.kubernetes_manifest.validatingwebhookconfiguration_istio_validator_istio_system will be updated in-place
  ~ resource "kubernetes_manifest" "validatingwebhookconfiguration_istio_validator_istio_system" {
      ~ object   = {
          ~ webhooks   = [
              ~ {
                  ~ failurePolicy           = "Fail" -> "Ignore"
                    name                    = "rev.validation.istio.io"
                    # (9 unchanged attributes hidden)
                },
            ]
            # (3 unchanged attributes hidden)
        }
        # (1 unchanged attribute hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

Trying to apply these changes will fail with the following error:

Error: There was a field manager conflict when trying to apply the manifest for "/istio-validator-istio-system"

  with module.istio.kubernetes_manifest.validatingwebhookconfiguration_istio_validator_istio_system,
  on istio/main.tf line 1173, in resource "kubernetes_manifest" "validatingwebhookconfiguration_istio_validator_istio_system":
1173: resource "kubernetes_manifest" "validatingwebhookconfiguration_istio_validator_istio_system" {

The API returned the following conflict: "Apply failed with 1 conflict: conflict with \"pilot-discovery\" using
admissionregistration.k8s.io/v1: .webhooks[name=\"rev.validation.istio.io\"].failurePolicy"

You can override this conflict by setting "force_conflicts" to true in the "field_manager" block.

The suggested fix by setting force_conflicts = true is not a good solution. It will allow us to apply the plan, but always show the same changes on every plan output.

The cause of the issue can be found by looking at the Istio YAML manifests, which have the following comment:

# Fail open until the validation webhook is ready. The webhook controller
# will update this to `Fail` and patch in the `caBundle` when the webhook
# endpoint is ready.
failurePolicy: Ignore

The issue is that the Istio webhook controller will change the failurePolicy after the deployment, but this state change is not reflected in the Terraform state.

To fix it we can comment out the failurePolicy in the ValidatingWebhookConfiguration, which will set it to Fail:

# shortened example
resource "kubernetes_manifest" "validatingwebhookconfiguration_istio_validator_istio_system" {
  manifest = {
    "apiVersion" = "admissionregistration.k8s.io/v1"
    "kind" = "ValidatingWebhookConfiguration"
    "webhooks" = [
      {
        "name" = "rev.validation.istio.io"
        // "failurePolicy" = "Ignore"
      }
    ]
  }
}

Now running terraform plan will show us no changes. The installation is complete and we have successfully installed Istio using the Kubernetes provider.

Conclusion

The Terraform Kubernetes provider is a good option for managing application deployments in the following cases:

Small team with simple infrastructure
Write custom and minimal deployment specifications for third party applications.
Infrequently update applications

For larger, production deployments I wouldn't consider it a good option:

The YAML to HCL conversion takes too long. Tools like tfk8s are helpful, but not perfect. Getting it to apply successfully requires trial and error.
Upgrading to a new version is difficult. The whole process of converting and fixing has to be repeated.
Running terraform plan takes too long. It's easy to have over 100 resources to manage after installing a few third party applications. (We could use the -target option, but then always need to find the right resource names).
Constant fixing of Terraform state. Both tools manage their own state, and Kubernetes constantly reconciles. Any changes in the Kubernetes state need to be manually changed in the Terraform state. See above example, where Istio patches Kubernetes resources after the deployment, and Terraform always tries to revert them.

In my next blog post I'm going to cover the Terraform Helm provider, which makes it easier to install third party applications, but also comes with a few downsides.