Job Description As a Sr. IT Disaster Recovery Engineer, you will be responsible for ensuring the availability and resiliency of critical IT systems and services in the event of a disaster or other disruptive event. Your main duties will include designing and implementing disaster recovery plans, testing and validating recovery procedures, and leading recovery efforts to improve overall resiliency. You will work closely with other IT professionals, including architects, system engineers, and security specialists, to identify critical systems and services, assess their risk of failure, and develop recovery strategies. You will also collaborate with business leaders to ensure that recovery plans align with business priorities and objectives. To be successful in this role, you should have extensive experience in disaster recovery planning, including expertise in modern cloud strategies, backup and recovery solutions, replication strategies, and service containerization. You should also have strong project management skills and be able to lead and coordinate recovery efforts across multiple teams and stakeholders. In addition, you should have excellent communication and interpersonal skills, as you will need to work closely with stakeholders at all levels of the organization to develop and implement recovery plans. You should also have a solid understanding of IT governance frameworks, such as NIST, ISO 27031 & 22301, COBIT and ITIL, and be able to ensure compliance with relevant regulations and standards. Essential Functions: Reasonable accommodations may be made to enable individuals with disabilities to perform the essential duties. • Lead risk assessments for IT, tabletops and facilitation of continuity exercises. • Coordinates and monitors all disaster recovery testing exercise events to ensure activities progress according to event plans, issues are logged, and status reporting is provided to stakeholders. • Support disaster recovery and preparedness efforts to mitigate, prepare, respond to, and recover from significant events and incidents that impact Stride. • Partner with leadership, application, and technical teams to identify & define gaps, validate requirements, and define solutions that meet or exceed expected RTO/RPO. • Work with leadership, application, and technical teams to document DR processes/procedures across Stride's technical environment to ensure that policies, plans, procedures, and strategies effectively provide and support a recovery framework for restoration of critical systems and data to meet or exceed established business, client, and audit requirements. • Support the creation and integration of Stride's Resiliency & Chaos Engineering strategy, processes, tools, and execution. • Work with technical teams to ensure that disaster recovery solutions are adequate, in place, maintained, and tested as part of the regular operational lifecycle. • Oversee Disaster Recovery Plans, document preparedness status and reports to management and track status of any agreed remediation items to closure. • Provides expert guidance to and coordinates the efforts of relevant technology (infrastructure and application), business, and other function leaders in developing, documenting, and validating recovery procedures & plans. Supervisory Responsibilities: This position has no formal supervisory responsibilities. Minimum Required Qualifications: • 8+ years' experience supporting or performing a Business Continuity Management or IT Disaster Recovery role. • Bachelor's degree and/or the equivalent combination of education and experience. • Understanding of Cloud infrastructure, database, and application development and design. • Independent, action-oriented and engagement focused on identifying ways to improve resiliency. • Functional knowledge of frameworks such as NIST, ISO 27031 & ISO 22301, COBIT, and ITIL. • Experience working with SRE, DiRT, and Chaos Engineering practices. • Thorough knowledge and understanding of business continuity and disaster recovery planning techniques, technologies and best practices, methods used in performing risk analysis and business impact analyses. • Strong familiarity with AWS services relevant to DR/HA and resilient architectures, including AWS Config, CloudFormation, Load Balancers, Autoscaling, AWS Resilience Hub, AWS Elastic Disaster Recovery. • Experience working with enterprise Risk Management solutions (Such as ServiceNow, Archer, Resolver, etc.) Desired Qualifications: • Domain Knowledge of Chaos Engineering / Fault Injection and Disaster Recovery best practices. • Skilled on Compute and Storage topology, design, and administration in a Microsoft/Unix/Linux environment. • Understanding of AWS Regions and Availability Zone concepts, including relationship of various AWS services (EC2, S3, IAM, RDS, etc.). • Experience working in an enterprise IT environment (on prem & cloud), evaluating IT system resiliency via recovery plans inclusive of logical and physical (Visio) diagrams. • Possesses strong analytical skills to effectively influence recommendations and decision-making, assess impacts, compare solutions, problem solve, and achieve business and/or technical objectives. • Ability to act as a change agent, leads and welcomes innovative ideas and drives continuous improvement and service optimization. Desired Certificates: • Information Technology Infrastructure Library (ITIL) • AWS Certified Cloud Practitioner • AWS Certified Solutions Architect OTHER Qualifications: • This is a remote-based position. Employee will be required to have appropriate local services to work from home on a fulltime basis. • Experience working with project management tools that support Agile environments (Jira/Confluence). • Ability to work independently and without direct supervision. • Clear written and verbal communication skills. • Ability to travel 10% of the time. • Pass required background check. WORK ENVIRONMENT: The work environment characteristics described here are representative of those an employee encounters while performing the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. · This position is virtual and open to residents of the 50 states and Washington, D.C. Compensation & Benefits: Stride, Inc. considers a person's education, experience, and qualifications, as well as the position's work location, expected quality and quantity of work, required travel (if any), external market and internal value when determining a new employee's salary level. Salaries will differ based on these factors, the position's level and expected contribution, and the employee's benefits elections. Offers will typically be in the bottom half of the range. • We anticipate the salary range to be $81,045.75- $201,088.80. The upper end of this range is not likely to be offered, as an individual's compensation can vary based on several factors. These factors include, but are not limited to, geographic location, experience, training, education, and local market conditions. Eligible employees may receive a bonus. Stride offers a robust benefits package for eligible employees that can include health benefits, retirement contributions, and paid time off. The above job is not intended to be an all-inclusive list of duties and standards of the position. Incumbents will follow any other instructions, and perform any other related duties, as assigned by their supervisor. All employment is "at-will" as governed by the law of the state where the employee works. It is further understood that the "at-will" nature of employment is one aspect of employment that cannot be changed except in writing and signed by an authorized officer. Job Type Regular The above job is not intended to be an all-inclusive list of duties and standards of the position. Incumbents will follow any other instructions, and perform any other related duties, as assigned by their supervisor. All employment is "at-will" as governed by the law of the state where the employee works. It is further understood that the "at-will" nature of employment is one aspect of employment that cannot be changed except in writing and signed by an authorized officer. Stride, Inc. is a Federal Contractor, an Equal Opportunity/Affirmative Action Employer and a Drug-Free Workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected Veteran status age, or genetics, or any other characteristic protected by law. Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)
Job Type
Fulltime role
Skills required
Agile
Location
Nashville, TN
Salary
No salary information was found.
Date Posted
August 3, 2024
Stride, Inc. is seeking a Senior Engineer for IT Disaster Recovery in Nashville, TN, responsible for ensuring the availability and resiliency of critical IT systems. The role involves designing disaster recovery plans, leading recovery efforts, and collaborating with various IT and business stakeholders.