KS
Cloud Operations Specialist - Overnight
Knox SystemsDevOpsOnsite • Woburn, Massachusetts, Common Street 10$75k-95kPosted 28 days ago
Job Description
We are seeking a Cloud Operations Specialist (L1) to join our hybrid team in Woburn, MA. This position is crucial for first-line monitoring, triage, and rapid incident response across our multi-tenant and single-tenant cloud environments, ensuring system availability, security, and compliance. Our competitive benefits package includes medical, dental, vision, life & disability insurance, and an employee-funded 401k plan. Please note that benefits are subject to change. This role requires participation in a rotating on-call schedule for after-hours incidents and holiday coverage. You will be part of a dynamic team dedicated to maintaining operational excellence in a 24x7 Network/Cloud Operations setting.
- US Citizenship required; dual citizenship prohibited
- 1-3 years of experience in a NOC, SOC, or application support environment
- Experience with customer-facing web applications, including alert triage and incident documentation
- Familiarity with Linux administration and command-line tools
- Knowledge of AWS, Azure, or GCP infrastructure services
- Understanding of network, compute, and application monitoring fundamentals
- General application troubleshooting skills and familiarity with web technologies like HTTP, REST APIs, and JSON
- Strong attention to detail, excellent communication, and documentation abilities
- Preferred certifications: CompTIA Security+, Linux+, ITIL v4, AWS Cloud Practitioner, or Microsoft Fundamentals (AZ-900)
- Monitor infrastructure, applications, and network health using tools such as Grafana, Wiz, Datadog, and CrowdStrike Falcon
- Detect, triage, and escalate alerts based on their severity and business impact
- Document incident timelines, actions taken, and resolutions in ticketing systems like ServiceNow and Jira Service Management
- Follow established FedRAMP incident handling and escalation protocols
- Execute predefined runbooks for system checks, restarts, and health verifications
- Validate the health of systems and services post-maintenance and deployment
- Assist in coordinating system patching, log collection, and audit evidence gathering
- Maintain awareness of system uptime, customer impact, and scheduled changes
- Provide basic troubleshooting support for hosted applications
- Validate API connectivity and help identify integration failures or logical errors
- Collaborate with developers and CloudOps engineers to verify deployment health post-releases
- Escalate application-related issues with detailed context, including affected users and tenant IDs
- Ensure compliance with change control, access management, and incident response procedures
- Record comprehensive incident notes and preserve compliance-ready audit trails
- Participate in Continuous Monitoring (ConMon) reporting and FedRAMP evidence collection
- US Citizenship required; dual citizenship prohibited
- 1-3 years of experience in a NOC, SOC, or application support environment
- Experience with customer-facing web applications, including alert triage and incident documentation
- Familiarity with Linux administration and command-line tools
- Knowledge of AWS, Azure, or GCP infrastructure services
- Understanding of network, compute, and application monitoring fundamentals
- General application troubleshooting skills and familiarity with web technologies like HTTP, REST APIs, and JSON
- Strong attention to detail, excellent communication, and documentation abilities
- Preferred certifications: CompTIA Security+, Linux+, ITIL v4, AWS Cloud Practitioner, or Microsoft Fundamentals (AZ-900)
- Monitor infrastructure, applications, and network health using tools such as Grafana, Wiz, Datadog, and CrowdStrike Falcon
- Detect, triage, and escalate alerts based on their severity and business impact
- Document incident timelines, actions taken, and resolutions in ticketing systems like ServiceNow and Jira Service Management
- Follow established FedRAMP incident handling and escalation protocols
- Execute predefined runbooks for system checks, restarts, and health verifications
- Validate the health of systems and services post-maintenance and deployment
- Assist in coordinating system patching, log collection, and audit evidence gathering
- Maintain awareness of system uptime, customer impact, and scheduled changes
- Provide basic troubleshooting support for hosted applications
- Validate API connectivity and help identify integration failures or logical errors
- Collaborate with developers and CloudOps engineers to verify deployment health post-releases
- Escalate application-related issues with detailed context, including affected users and tenant IDs
- Ensure compliance with change control, access management, and incident response procedures
- Record comprehensive incident notes and preserve compliance-ready audit trails
- Participate in Continuous Monitoring (ConMon) reporting and FedRAMP evidence collection
More DevOps Jobs
2 days ago
DevOpsSource: DevITJobsOnsite • Washington, District-Of-Columbia, Ellipse Road Northwest$115k-125k
2 days ago
2 days ago