infra-bench
Open infrastructure benchmark for AI agents.
infra-bench compares AI agents on reproducible infrastructure repair and operations tasks.
Dataset
Kubernetes-first infrastructure benchmark scenarios for AI agents.
Easy
Medium
Hard
Latest Run Pass Rate
Select a model to inspect its task results.
gpt-5.5 medium
OpenAI
84.5%
pass rate(49/58)
Pass Rate By Difficulty
Easy
21/22
Medium
20/23
Hard
8/13
Pass Rate By Category
Task Duration Distribution
Task Details
Open a task to inspect logs and artifacts.
| add-maintenance-toleration | ||||
| add-rbac-list-verb | ||||
| allow-api-network-policy | ||||
| clear-upgrade-blocking-apis | ||||
| complete-namespace-restore-preserving-state | ||||
| complete-node-pool-drain-migration | ||||
| complete-staging-namespace-restore | ||||
| coordinate-secret-rotation-rollout | ||||
| debug-service-endpoints | ||||
| fix-config-key-reference | ||||
| fix-controller-service-selector | ||||
| fix-crashloop-env-var | ||||
| fix-hpa-scale-target | ||||
| fix-job-command-argument | ||||
| fix-node-selector-mismatch | ||||
| fix-pvc-mount-claim | ||||
| fix-quirky-health-endpoint | ||||
| fix-restricted-security-context | ||||
| fix-service-dns-name | ||||
| fix-simple-cr-field | ||||
| harden-payments-stack-without-breaking-runtime | ||||
| place-inference-canary-on-gpu-node | ||||
| prepare-node-drain-with-pdb | ||||
| reconnect-checkout-worker-queue | ||||
| reconnect-frontend-api | ||||
| recover-api-rollout-after-config-change | ||||
| recover-nightly-report-cronjob | ||||
| recover-web-rollout-after-bad-release | ||||
| repair-cache-volume-binding | ||||
| repair-cross-namespace-service-discovery | ||||
| repair-ingress-backend-port | ||||
| repair-payment-tenant-network-boundary | ||||
| repair-plugin-driven-app-startup | ||||
| repair-readiness-probe-path | ||||
| repair-report-custom-resource-status | ||||
| repair-report-operator-finalizer-reconcile | ||||
| repair-restricted-multi-container-pod | ||||
| repair-secret-projection-reload | ||||
| repair-sidecar-generated-config | ||||
| repair-statefulset-headless-service-identity | ||||
| repair-worker-hpa-scaling-inputs | ||||
| replace-deprecated-ingress-api | ||||
| restore-alert-signal-after-telemetry-split | ||||
| restore-checkout-network-path | ||||
| restore-grafana-logs-datasource | ||||
| restore-metrics-controller-after-values-change | ||||
| restore-missing-configmap | ||||
| restore-multi-hop-checkout-route | ||||
| restore-order-pipeline-after-queue-migration | ||||
| restore-portal-ingress-tls-route | ||||
| restore-stateful-cache-identity | ||||
| restore-worker-config-access | ||||
| rightsize-cpu-request | ||||
| schedule-reporting-api-on-labeled-node | ||||
| stabilize-checkout-autoscaling-under-load | ||||
| stabilize-cpu-throttled-worker | ||||
| target-gpu-node-label | ||||
| trace-service-route-regression |