Site Reliability Engineer/ Admin - local to Austin, TX - Onsite

Job TypeContractor

Skills RequiredCI/CD GitHub Jenkins +3

LocationAustin, Texas

SalaryNo salary information was found.

Date Posted October 1, 2025

Technology Consultants, Inc.

Technology Consultants, Inc. is seeking a Site Reliability Engineer/Admin to ensure the resilience and performance of cloud services in Austin, TX. This onsite role focuses on automation, monitoring, and compliance within the VA's Enterprise Cloud.

Job description

Site Reliability Engineer/ Admin Austin, TX - Onsite role Duration: 6-12 months Note: Make sure all the skills are reflected in your recent projects. The Senior System Administrator/Site Reliability Engineer (SRE)in the VA s Enterprise Cloud is responsible for ensuring the resilience, performance, reliability, and compliance of mission-critical cloud services that support Veterans and VA stakeholders. This role bridges software engineering, systems engineering, and operations to deliver highly available, secure, and efficient cloud-based platforms aligned with VA s modernization strategy and federal compliance mandates, with a focus on reliability, performance, scalability, and automation. Though day-to-day tasks vary, depending on the various organizations and their systems, generally this role s daily work cadence follows these categories: • Proactively monitor system health, availability, and performance using observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). • Respond to alerts and incidents, triage issues, and perform root cause analysis (RCA). • Lead on-call rotations to ensure 24/7 uptime and quick recovery from outages. • Document incident reports and contribute to postmortems to prevent recurrence. • Automate manual operational tasks such as deployments, scaling, and configuration using tools like Ansible, Terraform, or Puppet. • Manage infrastructure as code (IaC) to ensure consistency across environments. • Optimize CI/CD pipelines for reliable and repeatable software delivery. • Build self-healing systems to minimize downtime. • Conduct load and stress testing to validate system performance under peak demand. • Establish and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). • Identify and reduce sources of latency, bottlenecks, and single points of failure. • Work with development teams to design reliability, scalability, and fault tolerance into customer servers. • Patch operating systems, containers, and dependencies to address vulnerabilities. • Ensure compliance with organizational and regulatory requirements. • Implement access controls, secrets management, and least privileged principles. • Monitor resource utilization (CPU, memory, storage, network) to anticipate scaling needs. • Plan for growth by forecasting demand and preparing infrastructure accordingly. • Optimize cloud costs by rightsizing instances, using autoscaling, and leveraging reserved/spot instances. • Partner with software engineers to embed reliability practices into development. • Mentor teams on best practices for observability, automation, and incident handling. • Participate in blameless postmortems and contribute to knowledge-sharing sessions. • Continuously evaluate new tools and technologies to improve system reliability. • Design, monitor, and maintain Customer Servers to meet VA s 99.9%+ uptime and SLA requirements across multi-cloud and hybrid environments. • Implement fault-tolerant and self-healing architectures leveraging automation. • Develop and manage observability frameworks (logging, metrics, tracing) to detect, respond to, and remediate incidents quickly. • Lead blameless postmortems and drive corrective actions to strengthen VAEC resilience. • Engineer scalable automation pipelines for provisioning, patching, and compliance (e.g., Ansible, Terraform, Puppet, GitHub Actions). • Reduce manual effort through self-service tools for operations teams. • Monitor and optimize application and infrastructure performance to meet demand from VA Medical Centers, Enterprise Data Warehouses, and end users. • Ensure latency, throughput, and resource utilization align with mission needs. • Integrate VA 6500, NIST 800-53, FedRAMP, and Zero Trust requirements into daily operations. • Partner with cybersecurity teams to enforce continuous ATO (cATO) practices and vulnerability remediation. • Collaborate with Release Management, Engineering, and Operations teams to improve change management, deployment pipelines, and reliability practices. • Drive the adoption of SRE principles (error budgets, SLIs, SLOs, SLAs) into VA s IT Service Management (ITSM) processes. • Operate across VA s Enterprise Cloud (VAEC), on-premises data centers, and hybrid platforms, ensuring seamless integration and interoperability. • Support workloads across AWS GovCloud, Microsoft Azure Government, and Oracle Cloud Infrastructure (OCI) where applicable. Qualifications • Must have 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering. • Strong experience with Linux/Unix systems administration and troubleshooting. • Proficient with cloud platforms (AWS and/or Azure), especially in deploying production workloads. • Deep understanding of monitoring, metrics, alerting, and observability. • Proficient in designing, implementing, and managing automation solutions using Ansible. • Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps). • Hands-on with containers and orchestration (Docker, Kubernetes, EKS, AKS). • Familiarity with networking concepts (TCP/IP, DNS, TLS, VPCs, load balancing). • Solid understanding of software development lifecycle (SDLC) and Agile methodologies. • Comfortable participating in on-call rotations and handling high-priority incidents. • Excellent analytical and problem-solving skills. • Ability to work in cross-functional teams and communicate effectively with developers, operations, and leadership. • A strong bias for automation and self-healing systems. • Ownership mindset with a commitment to reliability and continuous improvement. Preferred Qualifications: • AWS Certified SysOps Administrator or DevOps Engineer. • Linux Certified: Azure Administrator or DevOps Engineer Expert. • Certified Kubernetes Administrator (CKA). • Experience in chaos engineering, capacity modeling, or SRE tooling. Education: • Bachelor s degree in computer science, electronics engineering or related technical discipline and 5+ years work experience. • Eight (8) years of additional relevant experience may be substituted for education (13 years total). Sateesh

Requirements

• Must have 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering

• Strong experience with Linux/Unix systems administration and troubleshooting

• Proficient with cloud platforms (AWS and/or Azure), especially in deploying production workloads

• Deep understanding of monitoring, metrics, alerting, and observability

• Proficient in designing, implementing, and managing automation solutions using Ansible

• Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps)

• Hands-on with containers and orchestration (Docker, Kubernetes, EKS, AKS)

• Familiarity with networking concepts (TCP/IP, DNS, TLS, VPCs, load balancing)

• Solid understanding of software development lifecycle (SDLC) and Agile methodologies

• Comfortable participating in on-call rotations and handling high-priority incidents

• Excellent analytical and problem-solving skills

• Ability to work in cross-functional teams and communicate effectively with developers, operations, and leadership

• A strong bias for automation and self-healing systems

• Ownership mindset with a commitment to reliability and continuous improvement

• Bachelor s degree in computer science, electronics engineering or related technical discipline and 5+ years work experience

• Eight (8) years of additional relevant experience may be substituted for education (13 years total)

Similar Jobs

Energy Jobline ZR

Nov 3, 2025

Data Scientist Onsite in Bellevue, Washington

Seattle, Washington

Full-time job

Energy Jobline is seeking a Data Scientist to join their Product Development team in Bellevue, Washington, focusing on building eDiscovery intelligence products. The role involves developing data models and applying machine learning techniques to enhance business analytics.

View Details

AzurePythonGitHub

vTech Solution

Sep 29, 2025

Database Administrator at Austin, TX

Austin, Texas

Full-time job

vTech Solution is seeking a Database Administrator in Austin, TX, for a 12-month contract. The role requires expertise in MS Access, SQL databases, and cloud migration.

View Details

PostgreSQLAzureCI/CD

Chandra Technologies, Inc.

Oct 27, 2025

Citrix and Azure Virtual Desktop Administrator - ONSITE (MUST be Local to Austin, TX)

Austin, Texas

Full-time job

Chandra Technologies, Inc. is seeking a Citrix and Azure Virtual Desktop Administrator to manage and optimize virtual desktop environments onsite in Austin, TX. The role involves administering Citrix/AVD components, maintaining Windows Server systems, and ensuring compliance with security standards.

View Details

Azure

Taproot Solutions

Oct 29, 2025

Senior System Administrator / Site Reliability Engineer (Unix / Linux)

Austin, Texas

Full-time job

Taproot Solutions is seeking a Senior System Administrator / Site Reliability Engineer with expertise in Unix/Linux to ensure high availability and performance of critical systems. The role requires strong skills in cloud environments, automation, and DevOps practices.

View Details

AzureKubernetesCI/CD

Technology Consultants, Inc.

Oct 1, 2025