Terraform vs Pulumi: we chose Terraform and here is why
Pulumi lets you write infrastructure in real languages. Terraform makes you write HCL. We chose HCL because infrastructure should be boring.
Terraform vs Pulumi: We Chose Terraform
We evaluated Pulumi seriously. The pitch is compelling: write infrastructure in TypeScript, Python, or Go instead of HCL. Real functions, real loops, real testing frameworks. For a team that writes application code all day, Pulumi feels natural.
We went with Terraform anyway. Six months later, we're confident it was the right call for how we work.
The problem
Infrastructure as Code is a solved category in the sense that every tool works. The question is which tool's failure modes you can tolerate and which tool's constraints align with your team's workflow.
We provision infrastructure for client projects -- typically a VPC, a Kubernetes cluster (or a VM), a managed database, S3-compatible storage, and DNS records. We maintain 8-12 active infrastructure projects at any time, each with staging and production environments.
Why we evaluated Pulumi
Three things attracted us:
- Real programming constructs. HCL's
for_eachandcounthave sharp edges. Conditional resources require ternary hacks. Pulumi gives youifstatements,forloops, and functions. - Testability. Pulumi programs are testable with standard unit testing frameworks. Terraform testing is bolted on (
terraform testis recent and limited). - Reusable components. Pulumi components can be published as npm/pip packages. Sharing infrastructure patterns across projects is natural.
Why we chose Terraform
After a two-week evaluation where we provisioned the same infrastructure (VPC + EKS cluster + RDS + S3) in both tools:
1. HCL's constraints are a feature.
HCL is declarative and limited. You can't write arbitrary logic, import libraries, or build abstractions that hide infrastructure behind function calls. This feels like a limitation until you realize that infrastructure code should be inspectable by anyone on the team -- including the person on-call at 2 AM who didn't write it.
Pulumi's flexibility lets you build abstractions like:
const cluster = new ProductionCluster("healthcare", {
region: "eu-west-1",
nodeCount: 3,
addons: ["monitoring", "logging"],
});
This is elegant. It's also opaque. What resources does ProductionCluster create? What IAM roles? What network configuration? You have to read the component source code. In Terraform, every resource is explicit in the .tf files. There's nothing hidden behind an abstraction. Debugging at 2 AM favors explicitness.
2. The ecosystem is unmatched.
Terraform has providers for every cloud service, every SaaS platform, and most internal tools. We've never hit a resource that Terraform couldn't manage. Pulumi's provider coverage is good (they wrap many Terraform providers via Pulumi Bridge), but we encountered gaps in less common providers.
3. State management is battle-tested.
Terraform's state model is well-understood, well-documented, and has mature tooling for state manipulation (terraform state mv, terraform import). We've recovered from state issues multiple times using these tools. Pulumi's state management works, but the escape hatches are less mature.
4. Hiring and onboarding.
Every infrastructure engineer knows Terraform. HCL is a 2-day learning curve for any developer. Pulumi knowledge is rarer. When we staff a new project, Terraform competency is assumed; Pulumi competency would need to be taught.
Our Terraform patterns
# modules/vpc/main.tf -- reusable VPC module
resource "aws_vpc" "main" {
cidr_block = var.cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(var.common_tags, {
Name = "${var.project}-${var.environment}-vpc"
})
}
resource "aws_subnet" "private" {
count = length(var.private_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnet_cidrs[count.index]
availability_zone = var.azs[count.index]
tags = merge(var.common_tags, {
Name = "${var.project}-${var.environment}-private-${count.index}"
})
}
Our patterns:
- Modules for reuse, not abstraction. Modules group related resources. They don't hide complexity. Every input and output is documented.
- Remote state in S3 with DynamoDB locking. Non-negotiable. Local state is a ticking time bomb.
- Workspaces for environments. Same configuration, different state files.
terraform workspace select prod && terraform apply. terraform planin CI,terraform applymanually. Every PR shows the plan. Apply requires explicit approval.
In production
We manage infrastructure for 8 client projects with Terraform. Patterns that emerged:
- One repo per client, not one repo for all clients. Client infrastructure has different lifecycles, access controls, and compliance requirements. Shared repos create coupling.
- Pin provider versions. An AWS provider update changed the default behavior of S3 bucket ACLs and broke object access for a client. We now pin exact versions and upgrade deliberately.
terraform importis your friend. Clients often have existing infrastructure created via console. We import it into Terraform state rather than recreating it, avoiding downtime.
The tradeoffs
- HCL is verbose for complex logic. Creating conditional resources, dynamic blocks, and cross-module references in HCL is clunkier than in a real programming language. We accept this trade for explicitness.
- Testing is limited.
terraform validatecatches syntax errors.terraform plancatches configuration errors. Neither catches logical errors. We rely on staging environments for validation. - Drift detection is passive. Terraform only detects drift when you run
terraform plan. Unlike ArgoCD for Kubernetes, there's no continuous reconciliation. We runterraform planon a weekly cron to catch drift.
When we'd choose Pulumi
- A team of full-stack TypeScript developers with no HCL experience and no desire to learn it
- A project that requires complex infrastructure composition with heavy reuse across many projects
- An organization building an internal platform where infrastructure is consumed as library components
Our recommendation
For most teams provisioning cloud infrastructure, Terraform is the right choice. The ecosystem is the largest, the knowledge base is the deepest, and HCL's constraints prevent the kind of over-engineering that turns infrastructure code into application code.
Choose Pulumi if your team already knows it, if you need infrastructure-as-library patterns, or if HCL's limitations are actively causing pain on your projects. Both tools work. Terraform's failure modes are better documented.