Salary: $110,000 - 110,000 per year

Requirements:

Over 6 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or similar roles
Proficient with cloud service providers (AWS, Azure, or GCP)
Strong skills in infrastructure as code (Terraform, CloudFormation, Pulumi, etc.)
Experience with container technologies and orchestration (Docker, Kubernetes)
Solid foundations in Linux systems administration and networking
Background in building and managing CI/CD pipelines
Practical knowledge of monitoring and observability platforms (Datadog, Prometheus, Grafana, New Relic, etc.)
Strong problem-solving abilities and incident management expertise
Experience in automation and scripting (Python, Bash, Go, or related languages)

Responsibilities:

Design, develop, and maintain highly available and fault-tolerant systems
Lead efforts to enhance reliability across both production and non-production settings
Manage and advance monitoring, alerting, and observability systems
Facilitate incident response, conduct root cause analyses, and oversee post-incident assessments
Implement automation strategies to minimize manual operational efforts
Collaborate with Engineering, Security, and Product teams to fulfill platform requirements
Define and monitor service-level indicators (SLIs), service-level objectives (SLOs), and error budgets
Lead initiatives for capacity planning and performance optimization
Refine deployment, CI/CD, and infrastructure-as-code methodologies
Identify and mitigate risks to reliability and scalability before they affect users
Mentor junior engineers and contribute to technical standards within the team
Participate in on-call rotations and enhance on-call processes

Technologies:

AWS
Azure
Bash
CI/CD
Cloud
Datadog
DevOps
Docker
GCP
Grafana
Support
Kubernetes
Linux
Prometheus
Python
Security
Terraform

More:

We are a dynamic company seeking a Senior Site Reliability Engineer who will focus on the reliability, scalability, performance, and security of our production systems. This position is a perfect blend of software engineering and systems engineering, aimed at constructing resilient infrastructure and automating processes to lower operational risks. We offer competitive salaries, comprehensive medical, dental, and vision benefits, flexible work schedules, unlimited PTO, and support for professional development, among other perks. As a part of our team, you will play a key role in driving technical excellence within our organization.

last updated 10 week of 2026

Senior Site Reliability Engineer - Cloud Systems @ Bolt On Technology

Job Description

More Platform Engineering Jobs