Lead Site Reliability Engineer (SRE)
MastercardJob Description
At Mastercard, we drive an inclusive digital economy that enhances everyones experience by ensuring secure and accessible transactions. Our company culture thrives on diversity and collaboration, empowering every team member to contribute their unique strengths and insights. We are committed to continuous improvement and innovation in technology, anticipating that what we build now will shape a better tomorrow. As part of our Business Operations team, you will play a pivotal role in maintaining the reliability, scalability, and performance of our applications that support our global operations. We are looking for a motivated and experienced Lead Site Reliability Engineer (SRE) to join us in this mission. ### - A BS degree in Computer Science or a related technical field (e.g., physics or mathematics) or equivalent practical experience. - Proficiency in reading, writing, and understanding code in at least one programming language. - Strong understanding of DevOps principals, practices, and configuration management. - Experience in designing, building, and operating large-scale distributed systems emphasizing operational resilience. - A keen interest in automation and a willingness to explore new technologies to enhance system scalability. - Familiarity with algorithms, data structures, scripting, pipeline management, and software design. - Systematic problem-solving abilities, strong communication skills, and a proactive sense of ownership. - Experience in analyzing and troubleshooting large-scale distributed systems. - Strong leadership and mentoring capabilities. - Passionate about observability, automation, and continuous improvement. - Willingness to learn and embrace challenging opportunities while collaborating with diverse teams. - Hands-on experience with Kubernetes and containerization technologies like Docker and Azure Container Registry. - Strategic experience designing efficient solutions on Public Cloud platforms (AWS, Azure, or GCP) focusing on security and performance. - Azure DevOps (AZ-400) and Azure Cloud Developer (AZ-203) certifications preferred. - Knowledge of security implementations, certificate management, and encryption methodologies. ### - Ensure the stability and health of our platform as a Business Operations Site Reliability Engineer (SRE). - Support developers in producing resilient products by encouraging ownership and breaking down operational barriers. - Assist during application build phases with operational design, automation, and monitoring. - Foster an agile culture while enforcing operational standards and providing continuous feedback. - Engage proactively in the development lifecycle to optimize customer experience. - Collaborate closely with cross-functional teams to monitor system behavior and detect anomalies. - Conduct blameless post-mortems on issues to understand business impacts and improve operations. ###