Prompt engineering for infrastructure — what works and what doesn't

2026-03-31 Rico Twesten-Weber Principal DevOps Engineer

aiprompt-engineeringdevopsclaude-code

Every prompt engineering guide starts with “be specific.” Great advice if you’re generating a blog post or summarizing meeting notes. Completely insufficient when you’re generating infrastructure code that has to actually run on a real cluster.

I’ve spent months using Claude Code to generate Helm charts, Terraform modules, pipeline configs, and Kubernetes manifests. The failure modes are different from text generation, and the patterns that actually work are not what most guides teach.

Constraints beat descriptions

The single most effective pattern I’ve found for infrastructure prompts is constraint-based generation. Instead of describing what you want, specify what the output must satisfy.

Bad prompt: “Generate a Kubernetes deployment for a Node.js API service with proper resource management and security settings.”

That produces something plausible. It’ll have resource limits and a securityContext. The values will be reasonable defaults that have nothing to do with your environment. The image tag will be latest. It’ll work on a fresh minikube cluster and break on everything else.

Better prompt: “Generate a HelmRelease for a Node.js API. Constraints: image tag must come from .Values.image.tag, not hardcoded. CPU limit between 200m and 500m. Memory limit between 256Mi and 512Mi. readOnlyRootFilesystem: true. runAsNonRoot: true. Must include a PodDisruptionBudget with minAvailable: 1. Ingress class is traefik, not nginx.”

The difference is that constraints are testable. You can verify whether the output satisfies each one. Descriptions are subjective. “Proper resource management” means different things to different people and different clusters.

I now write prompts the same way I’d write acceptance criteria for a ticket. If I can’t verify whether the output meets the requirement, the requirement is too vague.

Convention files change everything

The second pattern that actually works is feeding your org’s conventions as system context. Not as part of the prompt itself, but as a separate document that the model treats as ground truth.

My convention file covers naming patterns, required labels, annotation standards, resource tiers, security baselines, and ingress configuration. It’s about 800 lines of structured rules. When I include it as context, the AI generates code that follows our patterns without me restating them every time.

This works because infrastructure conventions are consistent and rule-based. They don’t change between prompts. A convention file encodes the same knowledge that a senior engineer carries in their head and applies unconsciously during code review.

Without it, every prompt has to re-specify everything. With it, I can focus the prompt on what’s unique about this particular resource.

Example-based prompts copy bad patterns

Here’s where most advice goes wrong for infrastructure: “Show the model an example and ask for something similar.”

I tried this early on. Gave Claude Code an existing Terraform module and asked it to generate a new one for a different resource. The output was structurally similar. It also faithfully reproduced every shortcut and anti-pattern in the original.

The example module had hardcoded values that should have been variables. The generated module had hardcoded values too, just different ones. The example used a deprecated provider syntax. The generated code used the same deprecated syntax.

Example-based generation optimizes for similarity, not correctness. When your examples are perfect, this works great. When your examples contain the accumulated compromises of two years of “we’ll fix it later,” you’re just automating technical debt.

What works instead: describe the pattern abstractly in your convention file, then use constraints to specify the instance. The AI follows the abstract pattern without inheriting the specific shortcuts.

Open-ended generation is a trap

“Create a Kubernetes deployment for my app.” Six words that produce 50 lines of YAML that will not work in your cluster.

The AI generates for the most common case. It assumes nginx-ingress because that’s the most common ingress controller. It assumes a default namespace because most tutorials use it. It sets resource limits based on what looks reasonable in a generic context, not what your monitoring data says this service actually needs.

Open-ended generation produces output that looks like documentation examples. Clean, well-structured, and wrong for your specific setup. The danger is that it passes casual review. You glance at it, see familiar patterns, and assume it’s fine.

I’ve learned to never generate infrastructure code without at least specifying the target environment, the ingress controller, the resource tier, and any service-specific requirements. If I can’t provide those constraints, I’m not ready to generate the code.

The validation loop

The pattern that ties everything together: generate, validate, fix, commit.

Generation is just step one. After Claude Code produces a manifest, I run it through helm template to render the output. Then kubeconform for schema validation. Then a dry-run against a staging cluster. Each step catches different categories of errors.

The critical insight: feed validation failures back into the prompt. If kubeconform rejects the output, include the error message and ask for a corrected version. This creates a feedback loop where the AI learns from its own mistakes within the same session.

Three iterations usually gets to valid output. More than five iterations means the prompt is too vague or the task is too complex for generation. At that point, I write it manually and add the pattern to the convention file for next time.

Prompt engineering for infra is spec writing

After months of iteration, I’ve realized that writing good infrastructure prompts is basically writing specs. You define constraints, reference standards, specify validation criteria, and expect testable output.

This shouldn’t be surprising. Infrastructure code is all about conforming to precise requirements. The cluster doesn’t care about your intent. It cares whether the YAML matches the expected schema, whether the ports line up, whether the resource limits fall within the node’s capacity.

The generic “be creative, be specific, provide examples” advice works for text. For infrastructure, the advice is simpler: write a spec. Specify constraints. Include your conventions. Validate the output. And don’t trust anything that hasn’t been tested against a real cluster, no matter how clean the YAML looks.