infra-bench

Open infrastructure benchmark for AI agents.

infra-bench compares AI agents on reproducible infrastructure repair and operations tasks.

Dataset

Kubernetes-first infrastructure benchmark scenarios for AI agents.

22231358tasks
Easy
Medium
Hard

Latest Run Pass Rate

Select a model to inspect its task results.

gpt-5.5 medium

OpenAI

84.5%
pass rate(49/58)

Pass Rate By Difficulty

Easy
21/22
Medium
20/23
Hard
8/13

Pass Rate By Category

Task Duration Distribution

Task Details

Open a task to inspect logs and artifacts.