Technology Consultants, Inc. is seeking a Site Reliability Engineer/Admin to ensure the resilience and performance of cloud services in Austin, TX. This onsite role focuses on automation, monitoring, and compliance within the VA's Enterprise Cloud.
Site Reliability Engineer/ Admin Austin, TX - Onsite role Duration: 6-12 months Note: Make sure all the skills are reflected in your recent projects. The Senior System Administrator/Site Reliability Engineer (SRE)in the VA s Enterprise Cloud is responsible for ensuring the resilience, performance, reliability, and compliance of mission-critical cloud services that support Veterans and VA stakeholders. This role bridges software engineering, systems engineering, and operations to deliver highly available, secure, and efficient cloud-based platforms aligned with VA s modernization strategy and federal compliance mandates, with a focus on reliability, performance, scalability, and automation. Though day-to-day tasks vary, depending on the various organizations and their systems, generally this role s daily work cadence follows these categories: • Proactively monitor system health, availability, and performance using observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). • Respond to alerts and incidents, triage issues, and perform root cause analysis (RCA). • Lead on-call rotations to ensure 24/7 uptime and quick recovery from outages. • Document incident reports and contribute to postmortems to prevent recurrence. • Automate manual operational tasks such as deployments, scaling, and configuration using tools like Ansible, Terraform, or Puppet. • Manage infrastructure as code (IaC) to ensure consistency across environments. • Optimize CI/CD pipelines for reliable and repeatable software delivery. • Build self-healing systems to minimize downtime. • Conduct load and stress testing to validate system performance under peak demand. • Establish and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). • Identify and reduce sources of latency, bottlenecks, and single points of failure. • Work with development teams to design reliability, scalability, and fault tolerance into customer servers. • Patch operating systems, containers, and dependencies to address vulnerabilities. • Ensure compliance with organizational and regulatory requirements. • Implement access controls, secrets management, and least privileged principles. • Monitor resource utilization (CPU, memory, storage, network) to anticipate scaling needs. • Plan for growth by forecasting demand and preparing infrastructure accordingly. • Optimize cloud costs by rightsizing instances, using autoscaling, and leveraging reserved/spot instances. • Partner with software engineers to embed reliability practices into development. • Mentor teams on best practices for observability, automation, and incident handling. • Participate in blameless postmortems and contribute to knowledge-sharing sessions. • Continuously evaluate new tools and technologies to improve system reliability. • Design, monitor, and maintain Customer Servers to meet VA s 99.9%+ uptime and SLA requirements across multi-cloud and hybrid environments. • Implement fault-tolerant and self-healing architectures leveraging automation. • Develop and manage observability frameworks (logging, metrics, tracing) to detect, respond to, and remediate incidents quickly. • Lead blameless postmortems and drive corrective actions to strengthen VAEC resilience. • Engineer scalable automation pipelines for provisioning, patching, and compliance (e.g., Ansible, Terraform, Puppet, GitHub Actions). • Reduce manual effort through self-service tools for operations teams. • Monitor and optimize application and infrastructure performance to meet demand from VA Medical Centers, Enterprise Data Warehouses, and end users. • Ensure latency, throughput, and resource utilization align with mission needs. • Integrate VA 6500, NIST 800-53, FedRAMP, and Zero Trust requirements into daily operations. • Partner with cybersecurity teams to enforce continuous ATO (cATO) practices and vulnerability remediation. • Collaborate with Release Management, Engineering, and Operations teams to improve change management, deployment pipelines, and reliability practices. • Drive the adoption of SRE principles (error budgets, SLIs, SLOs, SLAs) into VA s IT Service Management (ITSM) processes. • Operate across VA s Enterprise Cloud (VAEC), on-premises data centers, and hybrid platforms, ensuring seamless integration and interoperability. • Support workloads across AWS GovCloud, Microsoft Azure Government, and Oracle Cloud Infrastructure (OCI) where applicable. Qualifications • Must have 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering. • Strong experience with Linux/Unix systems administration and troubleshooting. • Proficient with cloud platforms (AWS and/or Azure), especially in deploying production workloads. • Deep understanding of monitoring, metrics, alerting, and observability. • Proficient in designing, implementing, and managing automation solutions using Ansible. • Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps). • Hands-on with containers and orchestration (Docker, Kubernetes, EKS, AKS). • Familiarity with networking concepts (TCP/IP, DNS, TLS, VPCs, load balancing). • Solid understanding of software development lifecycle (SDLC) and Agile methodologies. • Comfortable participating in on-call rotations and handling high-priority incidents. • Excellent analytical and problem-solving skills. • Ability to work in cross-functional teams and communicate effectively with developers, operations, and leadership. • A strong bias for automation and self-healing systems. • Ownership mindset with a commitment to reliability and continuous improvement. Preferred Qualifications: • AWS Certified SysOps Administrator or DevOps Engineer. • Linux Certified: Azure Administrator or DevOps Engineer Expert. • Certified Kubernetes Administrator (CKA). • Experience in chaos engineering, capacity modeling, or SRE tooling. Education: • Bachelor s degree in computer science, electronics engineering or related technical discipline and 5+ years work experience. • Eight (8) years of additional relevant experience may be substituted for education (13 years total). Sateesh
TISTA Science and Technology Corporation is seeking a Senior System Administrator to ensure the resilience and performance of mission-critical cloud services for Veterans. This role involves monitoring, incident response, and automation to maintain high availability and compliance.
vTech Solution is seeking a Database Administrator in Austin, TX, for a 12-month contract. The role requires expertise in MS Access, SQL databases, and cloud migration.
Amaze Systems Inc is seeking a Senior Data / ML Engineer with expertise in Python and Java to develop scalable systems and train AI/ML models. The role is based in Alpharetta, Georgia, and requires onsite presence.
DMS Vision, Inc. is seeking a Remote .NET Web Developer for a hybrid role based in Madison, WI. The position requires local candidates with strong .NET and Azure experience for a 6+ month contract.
Technology Consultants, Inc. is seeking a Site Reliability Engineer/Admin to ensure the resilience and performance of cloud services in Austin, TX. This onsite role focuses on automation, monitoring, and compliance within the VA's Enterprise Cloud.
TISTA Science and Technology Corporation is seeking a Senior System Administrator (Linux/Unix) to ensure the resilience and reliability of mission-critical cloud services for Veterans. This role involves monitoring, automation, and collaboration with development teams to enhance system performance and compliance.
TISTA Science and Technology Corporation is seeking a Senior System Administrator to ensure the resilience and performance of mission-critical cloud services for Veterans. This role involves monitoring, incident response, and automation to maintain high availability and compliance.
vTech Solution is seeking a Database Administrator in Austin, TX, for a 12-month contract. The role requires expertise in MS Access, SQL databases, and cloud migration.
Amaze Systems Inc is seeking a Senior Data / ML Engineer with expertise in Python and Java to develop scalable systems and train AI/ML models. The role is based in Alpharetta, Georgia, and requires onsite presence.
DMS Vision, Inc. is seeking a Remote .NET Web Developer for a hybrid role based in Madison, WI. The position requires local candidates with strong .NET and Azure experience for a 6+ month contract.
Technology Consultants, Inc. is seeking a Site Reliability Engineer/Admin to ensure the resilience and performance of cloud services in Austin, TX. This onsite role focuses on automation, monitoring, and compliance within the VA's Enterprise Cloud.
TISTA Science and Technology Corporation is seeking a Senior System Administrator (Linux/Unix) to ensure the resilience and reliability of mission-critical cloud services for Veterans. This role involves monitoring, automation, and collaboration with development teams to enhance system performance and compliance.
TISTA Science and Technology Corporation is seeking a Senior System Administrator to ensure the resilience and performance of mission-critical cloud services for Veterans. This role involves monitoring, incident response, and automation to maintain high availability and compliance.
vTech Solution is seeking a Database Administrator in Austin, TX, for a 12-month contract. The role requires expertise in MS Access, SQL databases, and cloud migration.
Technology Consultants, Inc. is seeking a Site Reliability Engineer/Admin to ensure the resilience and performance of cloud services in Austin, TX. This onsite role focuses on automation, monitoring, and compliance within the VA's Enterprise Cloud.