GitOps with ArgoCD: Git Push Is the Only Deploy Command

The last time someone deployed from their laptop -- running kubectl apply against production at 11 PM on a Friday -- we spent the weekend debugging a config drift that broke auth for 200 users. That Monday, we started migrating to GitOps.

Six months later, we manage 40+ applications across 3 Kubernetes clusters with ArgoCD. No CI pipeline has credentials to the cluster. No developer runs kubectl against production. Git push is the only deploy command.

The problem

Traditional deployment pipelines create a gap between what's defined and what's running. CI builds an image, pushes it to a registry, then runs kubectl or helm to apply changes. The cluster's actual state lives in the cluster, not in a repository.

This creates three problems we hit repeatedly:

Config drift. Someone runs kubectl edit to fix a production issue. The change works, but it's never committed. Next deploy reverts it. Outage number two.
No audit trail. "Who changed the replica count from 3 to 1?" Nobody knows. Kubectl doesn't write commit messages.
Credential sprawl. Every CI pipeline needs cluster credentials. Every developer with kubectl access is a deployment vector. The blast radius of a leaked token is the entire cluster.

How we think about this

Three principles guide our GitOps practice:

Git is the single source of truth. If it's not in the repo, it doesn't exist. No manual changes, no imperative commands.
Pull, don't push. The cluster pulls its desired state from Git. No external system pushes changes into the cluster.
Drift is a bug. If the cluster state differs from Git, something is wrong and should be auto-corrected.

ArgoCD in our stack

ArgoCD runs as a controller inside each cluster. It watches Git repositories, compares the desired state (Helm charts in Git) with the actual state (what's running in the cluster), and reconciles the difference.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: patient-portal
  namespace: argocd
spec:
  project: healthcare
  source:
    repoURL: https://github.com/commitx/infra.git
    targetRevision: main
    path: services/patient-portal
    helm:
      valueFiles:
        - values-prod.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

The key settings:

automated.prune: true -- If a resource is removed from Git, ArgoCD deletes it from the cluster. No orphaned resources.
automated.selfHeal: true -- If someone manually modifies a resource, ArgoCD reverts it within 3 minutes. Config drift is automatically corrected.
source.helm.valueFiles -- Environment-specific values come from the same repo. Staging and production differ only in their values files, not in their templates.

The deployment workflow

Developer opens a PR that changes values-prod.yaml (e.g., bumps the image tag)
CI runs linting and Helm template validation on the PR
PR is reviewed and merged to main
ArgoCD detects the change within 3 minutes (configurable polling interval, or use webhooks for instant sync)
ArgoCD applies the diff to the cluster
If the new pods fail health checks, the rollout stalls -- old pods continue serving

Rollback: git revert <commit> && git push. ArgoCD syncs the previous state. Total rollback time: under 4 minutes including the Git operation.

What we learned running 40+ applications

ApplicationSets for multi-cluster. We don't create Application manifests by hand. ArgoCD's ApplicationSet controller generates them from a list of services and clusters. Adding a new service means adding a directory to the infra repo -- ArgoCD picks it up automatically.
Wave-based sync for dependencies. Services that depend on databases or config maps use sync waves. The database migration job runs in wave 1, the application deployment in wave 2. ArgoCD respects the ordering.
Notifications to Slack. ArgoCD's notification controller posts sync status to team channels. Every deploy, every drift detection, every failed sync is visible without checking the dashboard.
RBAC per team. ArgoCD's project abstraction lets us scope access. The frontend team sees their applications. The platform team sees everything. Nobody accidentally syncs a service they don't own.

ArgoCD vs Flux

We chose ArgoCD over Flux for one reason: the dashboard. Visualizing the sync state, health status, and resource tree of 40 applications at a glance is invaluable during incidents. Flux is technically excellent -- lighter weight, more Unix-philosophy -- but the operational visibility of ArgoCD's UI saved us hours of debugging over the first quarter.

The tradeoffs

Learning curve. ArgoCD's Application, AppProject, and ApplicationSet abstractions take time to internalize. Budget a week for the team to get comfortable.
Git repo structure matters. A poorly organized infra repo creates ArgoCD pain. We standardize on services/<name>/Chart.yaml with per-environment values files. Deviation from this structure causes sync confusion.
Secrets management is separate. ArgoCD syncs what's in Git. Secrets should not be in Git. We use Sealed Secrets (encrypted in Git, decrypted in cluster) or external-secrets-operator backed by Vault.

Our recommendation

If you run more than 5 services on Kubernetes and more than one person deploys to production, adopt GitOps. ArgoCD is the most production-ready implementation we've used. The setup cost is a day. The return is every deployment audited, every drift corrected, and every rollback a git revert away.