Terraform vs Pulumi: We Chose Terraform

We evaluated Pulumi seriously. The pitch is compelling: write infrastructure in TypeScript, Python, or Go instead of HCL. Real functions, real loops, real testing frameworks. For a team that writes application code all day, Pulumi feels natural.

We went with Terraform anyway. Six months later, we're confident it was the right call for how we work.

The problem

Infrastructure as Code is a solved category in the sense that every tool works. The question is which tool's failure modes you can tolerate and which tool's constraints align with your team's workflow.

We provision infrastructure for client projects -- typically a VPC, a Kubernetes cluster (or a VM), a managed database, S3-compatible storage, and DNS records. We maintain 8-12 active infrastructure projects at any time, each with staging and production environments.

Why we evaluated Pulumi

Three things attracted us:

Real programming constructs. HCL's for_each and count have sharp edges. Conditional resources require ternary hacks. Pulumi gives you if statements, for loops, and functions.
Testability. Pulumi programs are testable with standard unit testing frameworks. Terraform testing is bolted on (terraform test is recent and limited).
Reusable components. Pulumi components can be published as npm/pip packages. Sharing infrastructure patterns across projects is natural.

Why we chose Terraform

After a two-week evaluation where we provisioned the same infrastructure (VPC + EKS cluster + RDS + S3) in both tools:

1. HCL's constraints are a feature.

HCL is declarative and limited. You can't write arbitrary logic, import libraries, or build abstractions that hide infrastructure behind function calls. This feels like a limitation until you realize that infrastructure code should be inspectable by anyone on the team -- including the person on-call at 2 AM who didn't write it.

Pulumi's flexibility lets you build abstractions like:

const cluster = new ProductionCluster("healthcare", {
    region: "eu-west-1",
    nodeCount: 3,
    addons: ["monitoring", "logging"],
});

This is elegant. It's also opaque. What resources does ProductionCluster create? What IAM roles? What network configuration? You have to read the component source code. In Terraform, every resource is explicit in the .tf files. There's nothing hidden behind an abstraction. Debugging at 2 AM favors explicitness.

2. The ecosystem is unmatched.

Terraform has providers for every cloud service, every SaaS platform, and most internal tools. We've never hit a resource that Terraform couldn't manage. Pulumi's provider coverage is good (they wrap many Terraform providers via Pulumi Bridge), but we encountered gaps in less common providers.

3. State management is battle-tested.

Terraform's state model is well-understood, well-documented, and has mature tooling for state manipulation (terraform state mv, terraform import). We've recovered from state issues multiple times using these tools. Pulumi's state management works, but the escape hatches are less mature.

4. Hiring and onboarding.

Every infrastructure engineer knows Terraform. HCL is a 2-day learning curve for any developer. Pulumi knowledge is rarer. When we staff a new project, Terraform competency is assumed; Pulumi competency would need to be taught.

Our Terraform patterns

# modules/vpc/main.tf -- reusable VPC module
resource "aws_vpc" "main" {
  cidr_block           = var.cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.common_tags, {
    Name = "${var.project}-${var.environment}-vpc"
  })
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.azs[count.index]

  tags = merge(var.common_tags, {
    Name = "${var.project}-${var.environment}-private-${count.index}"
  })
}

Our patterns:

Modules for reuse, not abstraction. Modules group related resources. They don't hide complexity. Every input and output is documented.
Remote state in S3 with DynamoDB locking. Non-negotiable. Local state is a ticking time bomb.
Workspaces for environments. Same configuration, different state files. terraform workspace select prod && terraform apply.
terraform plan in CI, terraform apply manually. Every PR shows the plan. Apply requires explicit approval.

In production

We manage infrastructure for 8 client projects with Terraform. Patterns that emerged:

One repo per client, not one repo for all clients. Client infrastructure has different lifecycles, access controls, and compliance requirements. Shared repos create coupling.
Pin provider versions. An AWS provider update changed the default behavior of S3 bucket ACLs and broke object access for a client. We now pin exact versions and upgrade deliberately.
terraform import is your friend. Clients often have existing infrastructure created via console. We import it into Terraform state rather than recreating it, avoiding downtime.

The tradeoffs

HCL is verbose for complex logic. Creating conditional resources, dynamic blocks, and cross-module references in HCL is clunkier than in a real programming language. We accept this trade for explicitness.
Testing is limited. terraform validate catches syntax errors. terraform plan catches configuration errors. Neither catches logical errors. We rely on staging environments for validation.
Drift detection is passive. Terraform only detects drift when you run terraform plan. Unlike ArgoCD for Kubernetes, there's no continuous reconciliation. We run terraform plan on a weekly cron to catch drift.

When we'd choose Pulumi

A team of full-stack TypeScript developers with no HCL experience and no desire to learn it
A project that requires complex infrastructure composition with heavy reuse across many projects
An organization building an internal platform where infrastructure is consumed as library components

Our recommendation

For most teams provisioning cloud infrastructure, Terraform is the right choice. The ecosystem is the largest, the knowledge base is the deepest, and HCL's constraints prevent the kind of over-engineering that turns infrastructure code into application code.

Choose Pulumi if your team already knows it, if you need infrastructure-as-library patterns, or if HCL's limitations are actively causing pain on your projects. Both tools work. Terraform's failure modes are better documented.