What you’ll do - Implement and maintain monitoring systems to proactively identify and address potential issues before they impact users. - Automate repetitive tasks and processes, such as deployments, infrastructure management, and incident response, to improve efficiency and reduce manual effort. - Respond to incidents, diagnose problems, and implement solutions to restore service quickly. - Improve the performance and scalability of systems and applications, ensuring they can handle peak loads and user traffic. - Help plan future capacity needs, ensuring that systems can accommodate growth and evolving requirements, while remaining cost-efficient. - Work closely with development teams to understand their needs, guide them, and ensure that systems are designed and deployed reliably. - Build tools to codify and automate infrastructure operations. - Define and track SLIs and SLOs to measure the performance and reliability of services. - Assess and mitigate risks associated with deployments and infrastructure changes. - Assist with the release and deployment processes, ensuring that changes are rolled out smoothly and reliably. What you’ll bring - 5+ years of professional experience in a fast-paced SaaS or a similar business environment - 3+ years of hands-on experience supporting production systems as a Site Reliability Engineer (SRE) or a DevOps Engineer - 3+ years of hands-on experience with cloud services and technologies (GCP, AWS, Azure, etc.) - Experience with containerization and orchestration tools (e.g., Docker, Kubernetes) - Proficient in Infrastructure as Code (IaC) tools and methodologies (e.g. Terraform, Pulumi, Puppet, etc.) - Proven ability to troubleshoot and resolve complex technical issues in distributed systems - Ability to communicate effectively within the team and across the organization while sharing insights and updates and collaborating to achieve project goals Preferred skills: - Advanced working knowledge of GCP Services like GKE, GCS, IAM, etc. - Professional experience supporting containerized Java/JVM/Python services - Experience with relational databases, particularly PostgreSQL - 3+ years of professional experience designing, and implementing and/or administering CI/CD solutions (e.g. Github Actions, Buildkite, Jenkins, etc.) - Strong SRE mindset with focus on cloud networking and security best practices - Strong software development, particularly with scripting languages (e.g., Python, Bash, etc.) - Experience with system administration in general and Linux in particular - Familiarity with SOC2 / ISO 27001 security frameworks - Preference for someone in the Pacific / Mountain time zone
Job Type
Remote role
Skills required
No particular skills mentioned.
Location
San Francisco, California
Salary
No salary information was found.
Date Posted
March 25, 2025
Pano is seeking a Site Reliability Engineer to join their Platform team in San Francisco, California, focusing on ensuring the reliability and performance of software systems. The ideal candidate will bridge the gap between development and infrastructure operations through automation and proactive problem-solving.