Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A curated list of Site Reliability and Production Engineering Tools
GitHub stars and default-branch commits for SquadcastHub/awesome-sre-tools.
25 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
The open-source AIOps and alert management platform
SRE Agent - CNCF Sandbox Project
Terrateam is open-source GitOps infrastructure orchestration. It integrates with GitHub to automate Terraform, OpenTofu, CDKTF, Terragrunt, and Pulumi workflows through pull requests.
Abstruse is a free and open-source CI/CD platform that tests your models and code.
An open source Alerting and incident escalation tool
Code-Native Data Privacy
AI-powered SRE platform for automated incident investigation
Open-source AI copilot that lets you chat with your observability data and code 🧙♂️
A lightweight schema-on-read analytics in a single binary
DoctorGPT brings GPT into production for application log error diagnosing!
Slo-exporter computes standardized SLI and SLO metrics based on events coming from various data sources.
SLOs, Error windows and alerts are complicated. Here an attempt to make it easy
World's first fully integrated and fully Automated Kubernetes management and orchestration solution
Docker based continuous integration, continuous deployment and code lint review system for BitBucket
eBPF agent and MCP server for GPU causal observability
Fast, opinionated AWS security scanner. Curated checks. Zero noise. Copy-paste fixes.
Local-first TUI for AI coding-agent session history: trace cost, tokens, time, tool failures, latency, health, diffs, reports, and CI gates across local agent logs.
Open-source incident management Alerts, on-call, AI post-mortems. Self-hosted alternative to PagerDuty & incident.io. Works with Prometheus, Grafana, Datadog, Slack, and Teams. Free forever, BYO-AI.
Tailscale MCP server for managing your tailnet from AI assistants
Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.
SSL Certificate Expiry Monitor - Monitor SSL certificates and get alerts before they expire
Free open-source monitoring dashboard for OpenClaw AI agents — token usage, session tracking, 7-day trends, multi-model support
CLI for managing Rootly incidents, alerts, services, teams, and on-call schedules
Global DNS Propagation Checker - Check DNS propagation across worldwide servers in real-time