The Senior Site Reliability Engineer at Alphatec Spine is responsible for ensuring the availability and performance of the Informatix cloud platform. This role emphasizes automation, incident response, and operational excellence.
The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, scalability, and operational efficiency of the Informatix cloud platform. This role is focused on reducing manual operations work (toil), automating system reliability, and ensuring production-grade observability. The ideal candidate is a systems-focused engineer who is passionate about uptime, incident response, and continuous improvement through engineering solutions. Essential Duties and Responsibilities • Serve as a primary contributor to the on-call rotation to maintain 24/7 uptime for production systems. • Proactively, monitor, and continuously improve SLAs, SLOs, and SLIs across critical services. • Develop and maintain robust observability tooling including logging, metrics, and tracing (e.g., Azure Monitor, OpenTelemetry, Prometheus). • Proactively conduct postmortems and root cause analysis; implement fixes to prevent repeat incidents. • Identify and eliminate manual operational toil through scripting and automation. • Design and maintain automated incident detection and response systems. • Establish and maintain runbooks, playbooks, and escalation protocols for system support. • Contribute to chaos testing and failure injection to proactively uncover weaknesses. • Promote a culture of operational excellence through data-driven reliability practices. • Proactively communicating status Requirements The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. • 5+ years of experience in Site Reliability Engineering, systems engineering, or DevOps roles. • Expertise in monitoring and observability platforms (e.g., Grafana, Prometheus, ELK, Azure Monitor). • Solid background in incident response, root cause analysis, and on-call rotations. • Deep knowledge of Microsoft Azure, including containerized services (AKS), networking, and storage. • Strong automation and scripting experience (e.g., Python, Bash, PowerShell). • Familiarity with IaC tools such as Terraform, Bicep, or ARM templates. • Experience implementing SLIs/SLOs, operational dashboards, and error budgets. • Comfortable designing for resiliency, failover, and graceful degradation. • Knowledge of compliance frameworks (e.g., SOC 2, HITRUST, IEC 62304) is a plus. • Strong written and verbal communication with a focus on transparency and learning. Education and Experience • BS/MS in Computer Science, Engineering, or related technical field preferred. • 5+ years in production engineering roles with direct ownership of critical systems. • Microsoft certifications a plus For roles based in the United States that require access to hospital facilities, must be eligible for and maintain credentials at all required hospitals, including meeting any applicable physical requirements or vaccination requirements (including the COVID-19 vaccine, as applicable). ATEC is committed to providing equal employment opportunities to its employees and applicants without regard to race, color, religion, national origin, age, sex, sexual orientation, gender identity, gender expression, or any other protected status in accordance with all applicable federal, state or local laws. Further, ATEC will make reasonable accommodations that are necessary to comply with disability discrimination laws. Salary Range Alphatec Spine, Inc. complies with state and federal wage and hour laws and compensation depends upon candidate’s qualifications, education, skill set, years of experience, and internal equity. $135,000 to $145,000 Full-Time Salary Range The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, scalability, and operational efficiency of the Informatix cloud platform. This role is focused on reducing manual operations work (toil), automating system reliability, and ensuring production-grade observability. The ideal candidate is a systems-focused engineer who is passionate about uptime, incident response, and continuous improvement through engineering solutions. Essential Duties and Responsibilities • Serve as a primary contributor to the on-call rotation to maintain 24/7 uptime for production systems. • Proactively, monitor, and continuously improve SLAs, SLOs, and SLIs across critical services. • Develop and maintain robust observability tooling including logging, metrics, and tracing (e.g., Azure Monitor, OpenTelemetry, Prometheus). • Proactively conduct postmortems and root cause analysis; implement fixes to prevent repeat incidents. • Identify and eliminate manual operational toil through scripting and automation. • Design and maintain automated incident detection and response systems. • Establish and maintain runbooks, playbooks, and escalation protocols for system support. • Contribute to chaos testing and failure injection to proactively uncover weaknesses. • Promote a culture of operational excellence through data-driven reliability practices. • Proactively communicating status Requirements The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. • 5+ years of experience in Site Reliability Engineering, systems engineering, or DevOps roles. • Expertise in monitoring and observability platforms (e.g., Grafana, Prometheus, ELK, Azure Monitor). • Solid background in incident response, root cause analysis, and on-call rotations. • Deep knowledge of Microsoft Azure, including containerized services (AKS), networking, and storage. • Strong automation and scripting experience (e.g., Python, Bash, PowerShell). • Familiarity with IaC tools such as Terraform, Bicep, or ARM templates. • Experience implementing SLIs/SLOs, operational dashboards, and error budgets. • Comfortable designing for resiliency, failover, and graceful degradation. • Knowledge of compliance frameworks (e.g., SOC 2, HITRUST, IEC 62304) is a plus. • Strong written and verbal communication with a focus on transparency and learning. Education and Experience • BS/MS in Computer Science, Engineering, or related technical field preferred. • 5+ years in production engineering roles with direct ownership of critical systems. • Microsoft certifications a plus For roles based in the United States that require access to hospital facilities, must be eligible for and maintain credentials at all required hospitals, including meeting any applicable physical requirements or vaccination requirements (including the COVID-19 vaccine, as applicable). ATEC is committed to providing equal employment opportunities to its employees and applicants without regard to race, color, religion, national origin, age, sex, sexual orientation, gender identity, gender expression, or any other protected status in accordance with all applicable federal, state or local laws. Further, ATEC will make reasonable accommodations that are necessary to comply with disability discrimination laws. Salary Range Alphatec Spine, Inc. complies with state and federal wage and hour laws and compensation depends upon candidate’s qualifications, education, skill set, years of experience, and internal equity. $135,000 to $145,000 Full-Time Salary Range
Oracle is seeking a Principal Site Reliability DevOps Engineer to join their new Oracle Health organization, focusing on automation and product deployment in healthcare. This remote position requires US citizenship and offers a competitive salary range.
Futurex is seeking a Sr. Kubernetes DevOps Engineer to design, implement, and manage DevOps pipelines and Kubernetes infrastructure. The role requires expertise in automation, system reliability, and optimizing application delivery in a containerized environment.
The Senior Site Reliability Engineer at Alphatec Spine is responsible for ensuring the availability and performance of the Informatix cloud platform. This role emphasizes automation, incident response, and operational excellence.
The Boston Red Sox are seeking a DevOps and Site Reliability Engineer to enhance their Baseball Operations systems through cloud operations and automation. This full-time hybrid role focuses on Azure infrastructure and CI/CD pipeline development.
Paradyme Management is seeking a DevOps/Site Reliability Engineer (SRE) with Secret Clearance to manage and optimize Kubernetes clusters and cloud infrastructure. The role involves collaboration across teams to ensure reliability and scalability of AI solutions.
Oracle is seeking a Senior Site Reliability Engineer / DevOps to enhance automation and reliability in their new Oracle Health organization. This remote role requires US citizenship and focuses on cloud services and large-scale distributed systems.
Oracle is seeking a Principal Site Reliability DevOps Engineer to join their new Oracle Health organization, focusing on automation and product deployment in healthcare. This remote position requires US citizenship and offers a competitive salary range.
Futurex is seeking a Sr. Kubernetes DevOps Engineer to design, implement, and manage DevOps pipelines and Kubernetes infrastructure. The role requires expertise in automation, system reliability, and optimizing application delivery in a containerized environment.
The Senior Site Reliability Engineer at Alphatec Spine is responsible for ensuring the availability and performance of the Informatix cloud platform. This role emphasizes automation, incident response, and operational excellence.
The Boston Red Sox are seeking a DevOps and Site Reliability Engineer to enhance their Baseball Operations systems through cloud operations and automation. This full-time hybrid role focuses on Azure infrastructure and CI/CD pipeline development.
Paradyme Management is seeking a DevOps/Site Reliability Engineer (SRE) with Secret Clearance to manage and optimize Kubernetes clusters and cloud infrastructure. The role involves collaboration across teams to ensure reliability and scalability of AI solutions.
Oracle is seeking a Senior Site Reliability Engineer / DevOps to enhance automation and reliability in their new Oracle Health organization. This remote role requires US citizenship and focuses on cloud services and large-scale distributed systems.
Oracle is seeking a Principal Site Reliability DevOps Engineer to join their new Oracle Health organization, focusing on automation and product deployment in healthcare. This remote position requires US citizenship and offers a competitive salary range.
Futurex is seeking a Sr. Kubernetes DevOps Engineer to design, implement, and manage DevOps pipelines and Kubernetes infrastructure. The role requires expertise in automation, system reliability, and optimizing application delivery in a containerized environment.
The Senior Site Reliability Engineer at Alphatec Spine is responsible for ensuring the availability and performance of the Informatix cloud platform. This role emphasizes automation, incident response, and operational excellence.