SRE Jobs – Remote & On-Site Site Reliability Engineering Roles

Find your next site reliability engineer role. Browse remote SRE jobs and on-site positions focused on system reliability, observability, and incident response.

Site Reliability Engineers ensure that production systems remain available, performant, and resilient at scale. Our curated SRE job listings feature opportunities working with observability tools like Prometheus, Grafana, and Datadog, along with incident management platforms and chaos engineering practices. Whether you're looking for remote SRE jobs, senior site reliability engineer roles, or cloud SRE careers with AWS, Azure, or GCP, CloudOpsJobs connects you with companies committed to reliability excellence. Explore positions where you'll define SLOs, lead incident response, and build the automation that keeps systems running smoothly.

Search field ready

All Jobs

11 positions

SRESource: DevITJobsOnsite • Plano, Texas, Precision Drive 3301$70k-110k
19 days ago
SRESource: RemoteOKRemote • Remote$80k-120k
25 days ago
SRESource: DevITJobsOnsite • Novi, Michigan, Town Center Drive 26200$160k-160k
28 days ago
SRESource: DevITJobsOnsite • Dover, Delaware, Loockerman Plaza 98$209k-262k
29 days ago
SRESource: DevITJobsOnsite • Plano, Texas, East Spring Creek Parkway 1221$204k-257k
29 days ago
SRESource: DevITJobsOnsite • McLean, Virginia, Dewberry Court 1439$147k-184k
29 days ago
SRESource: DevITJobsOnsite • London, Canada, Wortley Road 1$70k-70k
29 days ago
SRESource: RemoteOKRemote • Global$80k-120k
29 days ago
Get Job Alerts
Subscribe to receive email notifications when new jobs match your criteria

Comma-separated keywords to match in job titles or descriptions

Choose how often you want to receive job alerts

About SRE Careers

Site Reliability Engineering roles focus on building and maintaining highly reliable production systems. Pioneered by Google, SRE combines software engineering with operations expertise to ensure system availability and performance.

Common Skills & Tools

  • • Observability: Prometheus, Grafana, Datadog, New Relic
  • • Incident Management: PagerDuty, Opsgenie, incident.io
  • • SLOs/SLIs: Error budgets, availability targets
  • • Chaos Engineering: Gremlin, Chaos Monkey, Litmus
  • • Cloud: AWS, Azure, GCP, Kubernetes
  • • Languages: Python, Go, Bash, SQL