We are Mastercard, a global technology company powering economies and helping people, businesses, and governments thrive in more than 200 countries and territories. Our mission is to build a secure, simple, smart, and accessible digital economy, supported by innovation, partnerships, and secure data networks. Within our technology organization, we foster an inclusive culture grounded in our decency quotient, valuing diverse perspectives, collaboration, and continuous learning. This Director, Site Reliability Engineer role sits within our Business Operations team and focuses on keeping our platforms stable, resilient, and production-ready while advancing automation, operational excellence, and customer value. We offer the opportunity to work alongside experienced leaders and technologists in a global environment, with a strong emphasis on inclusion, growth, and cross-functional impact.

- We are looking for a highly motivated and experienced Director, Site Reliability Engineering with proven leadership in complex, nuanced environments.
- We need someone who can independently apply leadership judgment to support broader business goals and act as a recognized contributor.
- We value experience building diverse, high-performing teams with a strong customer focus.
- You should be able to attract, develop, and grow future-ready talent while coaching and mentoring others.
- We expect the ability to think enterprise-wide, connect work to broader business impact, and act in the best interests of the company.
- You should be comfortable leading through ambiguity across varied markets and regulatory settings, using sound judgment and cross-cultural awareness.
- We are seeking someone who can drive speed, agility, accountability, and customer-centric outcomes.
- You should bring subject matter expertise in observability, including scripting and tooling for metrics, logs, and traces.
- You need strong programming and scripting skills for automation, operational tooling, monitoring, deployment, and incident response.
- You should have experience with Linux/Unix system and network administration, including troubleshooting and reliability practices.
- We value cloud infrastructure expertise across platforms such as AWS, Azure, or GCP.
- You should understand how to design for high availability, fault tolerance, disaster recovery, and scalable systems.
- Experience with DevOps practices, including CI/CD, containerization, and orchestration, is important.
- You should have strong troubleshooting capabilities across systems, applications, and networks.
- We expect knowledge of capacity planning, performance tuning, IT service management, and proactive monitoring.
- You should be able to coach others, shape best practices, and contribute strategically across teams or the organization.

- We will rely on you to develop and execute short- and medium-term strategic plans for Site Reliability Engineering across Mastercard.
- You will lead cross-functional efforts to automate operations, strengthen incident management, and improve system resilience.
- You will establish and sustain governance processes that uphold reliability standards and best practices.
- You will collaborate closely with engineering, product, and business teams to embed reliability into development and release processes.
- You will oversee incident response, ensuring fast resolution and thorough root cause analysis.
- You will promote continuous improvement, innovation, and proactive risk management across the organization.
- You will stay current on industry trends and emerging technologies related to automation and resilience.
- You will shape and drive our strategic vision for reliability, scalability, and operational efficiency.
- You will manage people leaders and/or senior individual contributors, including goal setting, performance reviews, mentoring, coaching, and talent development.
- You will help guide daily operations with a strong focus on triage, business impact analysis, and blameless post-mortems.
- You will engage early in the development lifecycle to support production readiness, operational design, automation, capacity planning, and monitoring.
- You will help manage production and change activities to improve customer experience and the value of supported applications.
- You will contribute to compliance and risk mitigation across our environments.

Director, Site Reliability Engineer

Job Description

More Site Reliability Engineering Jobs