GitOps with ArgoCD: Git push is the only deploy command
We manage 40+ applications across 3 clusters with ArgoCD. No CI pipeline touches the cluster. No one runs kubectl apply from a laptop.
GitOps with ArgoCD: Git Push Is the Only Deploy Command
The last time someone deployed from their laptop -- running kubectl apply against production at 11 PM on a Friday -- we spent the weekend debugging a config drift that broke auth for 200 users. That Monday, we started migrating to GitOps.
Six months later, we manage 40+ applications across 3 Kubernetes clusters with ArgoCD. No CI pipeline has credentials to the cluster. No developer runs kubectl against production. Git push is the only deploy command.
The problem
Traditional deployment pipelines create a gap between what's defined and what's running. CI builds an image, pushes it to a registry, then runs kubectl or helm to apply changes. The cluster's actual state lives in the cluster, not in a repository.
This creates three problems we hit repeatedly:
- Config drift. Someone runs
kubectl editto fix a production issue. The change works, but it's never committed. Next deploy reverts it. Outage number two. - No audit trail. "Who changed the replica count from 3 to 1?" Nobody knows. Kubectl doesn't write commit messages.
- Credential sprawl. Every CI pipeline needs cluster credentials. Every developer with kubectl access is a deployment vector. The blast radius of a leaked token is the entire cluster.
How we think about this
Three principles guide our GitOps practice:
- Git is the single source of truth. If it's not in the repo, it doesn't exist. No manual changes, no imperative commands.
- Pull, don't push. The cluster pulls its desired state from Git. No external system pushes changes into the cluster.
- Drift is a bug. If the cluster state differs from Git, something is wrong and should be auto-corrected.
ArgoCD in our stack
ArgoCD runs as a controller inside each cluster. It watches Git repositories, compares the desired state (Helm charts in Git) with the actual state (what's running in the cluster), and reconciles the difference.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: patient-portal
namespace: argocd
spec:
project: healthcare
source:
repoURL: https://github.com/commitx/infra.git
targetRevision: main
path: services/patient-portal
helm:
valueFiles:
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
The key settings:
automated.prune: true-- If a resource is removed from Git, ArgoCD deletes it from the cluster. No orphaned resources.automated.selfHeal: true-- If someone manually modifies a resource, ArgoCD reverts it within 3 minutes. Config drift is automatically corrected.source.helm.valueFiles-- Environment-specific values come from the same repo. Staging and production differ only in their values files, not in their templates.
The deployment workflow
- Developer opens a PR that changes
values-prod.yaml(e.g., bumps the image tag) - CI runs linting and Helm template validation on the PR
- PR is reviewed and merged to
main - ArgoCD detects the change within 3 minutes (configurable polling interval, or use webhooks for instant sync)
- ArgoCD applies the diff to the cluster
- If the new pods fail health checks, the rollout stalls -- old pods continue serving
Rollback: git revert <commit> && git push. ArgoCD syncs the previous state. Total rollback time: under 4 minutes including the Git operation.
What we learned running 40+ applications
- ApplicationSets for multi-cluster. We don't create Application manifests by hand. ArgoCD's ApplicationSet controller generates them from a list of services and clusters. Adding a new service means adding a directory to the infra repo -- ArgoCD picks it up automatically.
- Wave-based sync for dependencies. Services that depend on databases or config maps use sync waves. The database migration job runs in wave 1, the application deployment in wave 2. ArgoCD respects the ordering.
- Notifications to Slack. ArgoCD's notification controller posts sync status to team channels. Every deploy, every drift detection, every failed sync is visible without checking the dashboard.
- RBAC per team. ArgoCD's project abstraction lets us scope access. The frontend team sees their applications. The platform team sees everything. Nobody accidentally syncs a service they don't own.
ArgoCD vs Flux
We chose ArgoCD over Flux for one reason: the dashboard. Visualizing the sync state, health status, and resource tree of 40 applications at a glance is invaluable during incidents. Flux is technically excellent -- lighter weight, more Unix-philosophy -- but the operational visibility of ArgoCD's UI saved us hours of debugging over the first quarter.
The tradeoffs
- Learning curve. ArgoCD's Application, AppProject, and ApplicationSet abstractions take time to internalize. Budget a week for the team to get comfortable.
- Git repo structure matters. A poorly organized infra repo creates ArgoCD pain. We standardize on
services/<name>/Chart.yamlwith per-environment values files. Deviation from this structure causes sync confusion. - Secrets management is separate. ArgoCD syncs what's in Git. Secrets should not be in Git. We use Sealed Secrets (encrypted in Git, decrypted in cluster) or external-secrets-operator backed by Vault.
Our recommendation
If you run more than 5 services on Kubernetes and more than one person deploys to production, adopt GitOps. ArgoCD is the most production-ready implementation we've used. The setup cost is a day. The return is every deployment audited, every drift corrected, and every rollback a git revert away.